Papers
Topics
Authors
Recent
Search
2000 character limit reached

To Raise or Not To Raise: The Autonomous Learning Rate Question

Published 16 Jun 2021 in cs.LG | (2106.08767v3)

Abstract: There is a parameter ubiquitous throughout the deep learning world: learning rate. There is likewise a ubiquitous question: what should that learning rate be? The true answer to this question is often tedious and time consuming to obtain, and a great deal of arcane knowledge has accumulated in recent years over how to pick and modify learning rates to achieve optimal training performance. Moreover, the long hours spent carefully crafting the perfect learning rate can come to nothing the moment your network architecture, optimizer, dataset, or initial conditions change ever so slightly. But it need not be this way. We propose a new answer to the great learning rate question: the Autonomous Learning Rate Controller. Find it at https://github.com/fastestimator/ARC/tree/v2.0

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Wang, C., Bochkovskiy, A., Liao, H.M.: Scaled-yolov4: Scaling cross stage partial network. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, June 19-25, 2021, pp. 13029–13038. Computer Vision Foundation / IEEE, virtual (2021). https://doi.org/10.1109/CVPR46437.2021.01283. https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Scaled-YOLOv4_Scaling_Cross_Stage_Partial_Network_CVPR_2021_paper.html (3) Wang, C.-Y., Mark Liao, H.-Y., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., Yeh, I.-H.: Cspnet: A new backbone that can enhance learning capability of cnn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020) (4) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach (2020). https://openreview.net/forum?id=SyxS0T4tvS (5) Coleman, C.A., Narayanan, D., Kang, D., Zhao, T., Zhang, J., Nardi, L., Bailis, P., Olukotun, K., Re, C., Zaharia, M.: DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop (2017) (6) Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=Skq89Scxx (7) Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wang, C.-Y., Mark Liao, H.-Y., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., Yeh, I.-H.: Cspnet: A new backbone that can enhance learning capability of cnn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020) (4) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach (2020). https://openreview.net/forum?id=SyxS0T4tvS (5) Coleman, C.A., Narayanan, D., Kang, D., Zhao, T., Zhang, J., Nardi, L., Bailis, P., Olukotun, K., Re, C., Zaharia, M.: DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop (2017) (6) Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=Skq89Scxx (7) Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach (2020). https://openreview.net/forum?id=SyxS0T4tvS (5) Coleman, C.A., Narayanan, D., Kang, D., Zhao, T., Zhang, J., Nardi, L., Bailis, P., Olukotun, K., Re, C., Zaharia, M.: DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop (2017) (6) Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=Skq89Scxx (7) Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Coleman, C.A., Narayanan, D., Kang, D., Zhao, T., Zhang, J., Nardi, L., Bailis, P., Olukotun, K., Re, C., Zaharia, M.: DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop (2017) (6) Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=Skq89Scxx (7) Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=Skq89Scxx (7) Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  2. Wang, C.-Y., Mark Liao, H.-Y., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., Yeh, I.-H.: Cspnet: A new backbone that can enhance learning capability of cnn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020) (4) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach (2020). https://openreview.net/forum?id=SyxS0T4tvS (5) Coleman, C.A., Narayanan, D., Kang, D., Zhao, T., Zhang, J., Nardi, L., Bailis, P., Olukotun, K., Re, C., Zaharia, M.: DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop (2017) (6) Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=Skq89Scxx (7) Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach (2020). https://openreview.net/forum?id=SyxS0T4tvS (5) Coleman, C.A., Narayanan, D., Kang, D., Zhao, T., Zhang, J., Nardi, L., Bailis, P., Olukotun, K., Re, C., Zaharia, M.: DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop (2017) (6) Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=Skq89Scxx (7) Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Coleman, C.A., Narayanan, D., Kang, D., Zhao, T., Zhang, J., Nardi, L., Bailis, P., Olukotun, K., Re, C., Zaharia, M.: DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop (2017) (6) Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=Skq89Scxx (7) Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=Skq89Scxx (7) Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  3. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A Robustly Optimized BERT Pretraining Approach (2020). https://openreview.net/forum?id=SyxS0T4tvS (5) Coleman, C.A., Narayanan, D., Kang, D., Zhao, T., Zhang, J., Nardi, L., Bailis, P., Olukotun, K., Re, C., Zaharia, M.: DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop (2017) (6) Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=Skq89Scxx (7) Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Coleman, C.A., Narayanan, D., Kang, D., Zhao, T., Zhang, J., Nardi, L., Bailis, P., Olukotun, K., Re, C., Zaharia, M.: DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop (2017) (6) Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=Skq89Scxx (7) Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=Skq89Scxx (7) Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  4. Coleman, C.A., Narayanan, D., Kang, D., Zhao, T., Zhang, J., Nardi, L., Bailis, P., Olukotun, K., Re, C., Zaharia, M.: DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop (2017) (6) Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=Skq89Scxx (7) Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=Skq89Scxx (7) Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  5. Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=Skq89Scxx (7) Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  6. Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017) arXiv:1706.02677 (8) Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  7. Smith, L.N., Topin, N.: Super-convergence: Very fast training of residual networks using large learning rates. CoRR abs/1708.07120 (2017) arXiv:1708.07120 (9) Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  8. Vaswani, S., Mishkin, A., Laradji, I., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Advances in Neural Information Processing Systems, pp. 3727–3740 (2019) (10) Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  9. Mahsereci, M., Hennig, P.: Probabilistic line searches for stochastic optimization. The Journal of Machine Learning Research 18(1), 4262–4320 (2017) (11) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  10. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=r1Ue8Hcxg (12) Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  11. Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4092–4101. PMLR, Stockholm, Sweden (2018). http://proceedings.mlr.press/v80/pham18a.html (13) Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  12. Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, New Orleans, LA, USA (2019). https://openreview.net/forum?id=S1eYHoC5FX (14) Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  13. Bello, I., Zoph, B., Vasudevan, V., Le, Q.V.: Neural optimizer search with reinforcement learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 459–468. PMLR, Sydney, NSW, Australia (2017). http://proceedings.mlr.press/v70/bello17a.html (15) Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  14. Xu, Z., Dai, A.M., Kemp, J., Metz, L.: Learning an adaptive learning rate schedule. CoRR abs/1909.09712 (2019) arXiv:1909.09712 (16) Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  15. Shu, J., Zhu, Y., Zhao, Q., Meng, D., Xu, Z.: Meta-LR-Schedule-Net: Learned LR Schedules that Scale and Generalize (2020) (17) Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  16. Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016) (18) Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  17. Wu, Y., Ren, M., Liao, R., Grosse, R.B.: Understanding short-horizon bias in stochastic meta-optimization. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada (2018). https://openreview.net/forum?id=H1MczcgR- (19) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  18. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(61), 2121–2159 (2011) (20) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  19. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980 (21) Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  20. Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012) (22) Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  21. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Deep Learning and Unsupervised Feature Learning Workshop (2011) (23) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 (24) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  23. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org, Lille, France (2015). http://proceedings.mlr.press/v37/ioffe15.html (25) Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  24. Verma, G., Swami, A.: Error correcting output codes improve probability estimation and adversarial robustness of deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pp. 8643–8653 (2019). https://papers.nips.cc/paper/2019/hash/cd61a580392a70389e27b0bc2b439f49-Abstract.html (26) Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  25. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572 (27) Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  26. Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: European Conference on Computer Vision (2014) (28) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  27. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2261–2269. IEEE Computer Society, Honolulu, HI, USA (2017). https://doi.org/10.1109/CVPR.2017.243. https://doi.org/10.1109/CVPR.2017.243 (29) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  28. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.308. https://doi.org/10.1109/CVPR.2016.308 (30) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  29. Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 7482–7491. IEEE Computer Society, Salt Lake City, UT, USA (2018). https://doi.org/10.1109/CVPR.2018.00781. http://openaccess.thecvf.com/content_cvpr_2018/html/Kendall_Multi-Task_Learning_Using_CVPR_2018_paper.html (31) Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  30. Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) (32) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  31. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90. https://doi.org/10.1109/CVPR.2016.90 (33) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  32. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., III, W.M.W., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Munich, Germany (2015). https://doi.org/10.1007/978-3-319-24574-4_28. https://doi.org/10.1007/978-3-319-24574-4_28 (34) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  33. Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland, Oregon, USA (2011). http://www.aclweb.org/anthology/P11-1015 (35) Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  34. Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.R.: A conversational movie search system based on conditional random fields. In: INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp. 2454–2457. ISCA, Portland, Oregon, USA (2012). http://www.isca-speech.org/archive/interspeech_2012/i12_2454.html (36) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  35. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/n19-1423. https://doi.org/10.18653/v1/n19-1423 (37) Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  36. Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://science.sciencemag.org/content/350/6266/1332.full.pdf. https://doi.org/10.1126/science.aab3050 (38) Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  37. Koch, Gregory, Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop (2015) (39) Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  38. Karpathy, A.: The Unreasonable Effectiveness of Recurrent Neural Networks. http://karpathy.github.io/2015/05/21/rnn-effectiveness/. Accessed: 2020-11-04 (40) Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  39. Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Wu, D., Carpuat, M., Carreras, X., Vecchi, E.M. (eds.) Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/W14-4012. https://www.aclweb.org/anthology/W14-4012/ (41) Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  40. Jaeger, S., Candemir, S., Antani, S., Wang, Y.X., Lu, P.X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg 4(6), 475–477 (2014) (42) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  41. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9, 1735–80 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 (43) Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  42. Kumar, A., Liang, P., Ma, T.: Verified uncertainty calibration. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 3787–3798 (2019). https://proceedings.neurips.cc/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html (44) Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  43. Chicco, D., Jurman, G.: The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1), 1–13 (2020) (45) Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  44. Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Wilson, R.C., Hancock, E.R., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016. BMVA Press, York, UK (2016). http://www.bmva.org/bmvc/2016/papers/paper087/index.html (46) Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  45. Page, D.C.: Cifar10-Fast Dawn Benchmark Implementation. https://github.com/davidcpage/cifar10-fast. Accessed: 2020-11-04 (47) Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  46. Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2999–3007. IEEE Computer Society, Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.324. https://doi.org/10.1109/ICCV.2017.324 (48) Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04 Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
  47. Marcus, M.P.: Treebank-3 LDC99T42. Web Download. https://catalog.ldc.upenn.edu/LDC99T42. Accessed: 2020-11-04
Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.