Loading…
中国上海
2019 年 6 月 24–26 日
单击此处了解更多信息和注册

点击此处查看英文版日程表。
To view the English version of this schedule please go here.

我们将为所有主题演讲和分组会议提供同声传译服务。
Simultaneous translation will be provided for all keynote and breakout sessions.

场馆 + 赞助商展示区地图
Venue + Sponsor Showcase Map
Tuesday, June 25 • 13:35 - 14:10
通过托管 CPU 和 GPU 工作负载,实现资源的高效利用 - Penghao Cen,蚂蚁金服;Jian He,阿里巴巴

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
用户可在 Kubernetes 中运行各种工作负载,包括长时间运行的服务和 AI 批处理任务。通常,GPU 机器仅用于 AI 培训,并且资源利用率会一度变得很低。

您是否考虑过将不同类型的工作负载托管在同一个节点上,以便节省机器和成本?

本演讲将分享我们在 Kubernetes 集群中的托管实践和经验。

具体包括:
为何以及如何通过 BestEffort 创建新的 QoS 类别?
为何以及如何为批处理任务创建节点级 cgroup?
我们如何使用名为 PodGroup 的 CRD 来实现群组调度?
我们如何进行利用率评估?

在过去几个月中,我们构建了一个托管集群,该集群拥有 100 多个 GPU(NVIDIA Tesla P100)节点和 500 多个 CPU 节点。我们共同部署了长期运行的服务和 AI 批处理任务,并实现了 10% 的利用率增长。

Speakers
avatar for Penghao Cen

Penghao Cen

Senior Engineer, Ant Financial
Penghao Cen is a Senior Engineer at Ant Financial (formerly known as Alipay). He is currently an active contributor/member in Kubernetes and Kubeflow community focussing on resource management and scheduling. He primarily contributes to kubeflow/tf-operator project(Tools for MachineLearning/Tensorflow... Read More →
avatar for Jian He

Jian He

Staff Engineer, Alibaba
Jian He is a Staff Engineer at Alibaba where he works on a container infrastructures to support Alibaba massive workloads globally. Prior to that, he worked at Hortonworks Hadoop team, and primarily contributes to the Hadoop open source community and is also the Hadoop committer and... Read More →



Tuesday June 25, 2019 13:35 - 14:10 CST
430