2019 年 6 月 24–26 日

To view the English version of this schedule please go here.

Simultaneous translation will be provided for all keynote and breakout sessions.

场馆 + 赞助商展示区地图
Venue + Sponsor Showcase Map
Back To Schedule
Tuesday, June 25 • 16:00 - 16:35
Kubernetes 集群的大规模分布式深度学习 - Yuan Tang,蚂蚁金服;Yong Tang,MobileIron

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
本次演讲的重点是在 Kubernetes 上部署大规模分布式深度学习。此外,还将介绍如何通过使用运算符来管理和并实现机器学习训练过程自动化。我们将分享我们的经验,并比较两个开源 Kubernetes 运算符:tf-operator 和 mpi-operator。这两个运算符都为 TensorFlow 管理训练任务,但有着不同的分配策略,这就造成了 CPU、GPU 和网络利用率方面的不同性能结果。

深度学习任务既是网络密集型又是 GPU 密集型,因此对编排进行适当优化非常重要。易发的不平衡会导致闲置计算容量,这对于 GPU 节点来说成本太高昂了(与 CPU 相比)。我们将分享我们的经验,希望可提供有用的洞察,帮助从机器学习任务中获得更好的经济效益。

avatar for Yuan Tang

Yuan Tang

Senior Software Engineer, Ant Financial
Yuan is currently a senior software engineer at Ant Financial, building AI infrastructure and AutoML platform. He's a committer of TensorFlow, XGBoost, Apache MXNet, maintainer of several Kubeflow projects, and author of numerous open source softwares. He's also the author of best-selling... Read More →
avatar for Yong Tang

Yong Tang

Director of Engineering, Ivanti
Yong Tang is the director of engineering at Ivanti. He is a core maintainer of CoreDNS and contributes to many container, cloud-native, and machine learning projects for the open source community. In addition to CoreDNS, he is a maintainer of Docker/Moby. He is also a maintainer and... Read More →

Tuesday June 25, 2019 16:00 - 16:35 CST