Notebook--大数据研发治理套件-火山引擎

文档中心

大数据研发治理套件

请输入

Activity （任务节点）类型

Notebook

本文介绍 Notebook 类型的 Activity 配置，用于在 Pipeline 中执行 Notebook 文件。

概述

Notebook Activity 执行一个已有的 Notebook 文件，支持混合 SQL Cell 与 Python/Scala Cell。Notebook 中的所有 Cell 按顺序依次执行。
适用场景：

包含数据探查、清洗、转换等多步骤混合逻辑的任务
需要在 SQL 和 Python 之间灵活切换的场景
需要在任务中进行可视化分析或输出中间结果

配置示例

- name: data_analysis
  type: notebook
  source: WORKSPACE
  path: /Workspace/Users/zhang3/notebooks/analysis.notebook
  sqlEngineType: emr_serverless_spark
  sqlEngineQueue: default
  sqlComputingResourceGroupName: default_sql_group
  sqlComputingResourceGroupId: 1
  generalComputingResourceGroupName: default_py_group
  generalComputingResourceGroupId: 2
  parameterValues:
    biz_date: "{{pipeline.parameters.biz_date}}"
    region: "cn-beijing"
  retryPolicy:
    maxRetries: 2
    minRetryIntervalMillis: 60000
  position:
    x: "200"
    y: "100"

字段说明

字段	类型	必填	说明
`type`	String	是	固定为 `notebook`
`source`	Enum	是	代码来源，固定为 `WORKSPACE`
`path`	String	是	Notebook 文件在工作空间中的路径
`sqlEngineType`	Enum	条件必填	SQL Cell 的计算引擎类型
`sqlEngineQueue`	String	条件必填	SQL Cell 的计算队列名称
`sqlComputingResourceGroupName`	String	条件必填	SQL Cell 的资源组名称
`sqlComputingResourceGroupId`	Long	条件必填	SQL Cell 的资源组 ID
`generalComputingResourceGroupName`	String	条件必填	Python/Scala Cell 的资源组名称
`generalComputingResourceGroupId`	Long	条件必填	Python/Scala Cell 的资源组 ID
`parameterValues`	Map	否	参数传值（键值对），注入到 Notebook 执行上下文

资源组配置规则

Notebook 的资源组配置取决于其包含的 Cell 类型：

Cell 类型	需要配置的资源
仅包含 SQL Cell	`sqlEngineType` + `sqlEngineQueue` + `sqlComputingResourceGroupName/Id`
仅包含 Python/Scala Cell	`generalComputingResourceGroupName/Id`
混合 SQL 和 Python/Scala	以上全部

sqlEngineType 可选值

值	说明
`emr_serverless_spark`	EMR Serverless Spark 引擎
`presto`	Presto 引擎
`bytehouse`	ByteHouse 引擎

参数传递

通过 parameterValues 向 Notebook 传入参数，参数在 Notebook 中可直接作为变量使用：

parameterValues:
  biz_date: "{{pipeline.parameters.biz_date}}"
  threshold: "100"

在 Notebook Python Cell 中引用：

# 参数自动注入为同名变量
print(biz_date)     # 输出：2026-06-01
print(threshold)    # 输出：100

在 Notebook SQL Cell 中引用：

SELECT * FROM orders WHERE dt = '${biz_date}'

使用建议

建议	说明
明确资源组配置	根据 Notebook 中实际使用的 Cell 类型配置对应资源组，避免运行时报错。
参数化日期	使用 `parameterValues` 传入业务日期，不要在 Notebook 中硬编码日期。
控制 Notebook 长度	单个 Notebook 不宜过长，复杂逻辑建议拆分为多个 Activity。
避免交互式操作	Pipeline 中的 Notebook 以非交互模式运行，不支持手动输入和 Widget。

最近更新时间：2026.06.12 11:44:17

这个页面对您有帮助吗？

有用

无用

大数据研发治理套件

概述 #

配置示例 #

字段说明 #

资源组配置规则 #

sqlEngineType 可选值 #

参数传递 #