SAP Analytics Cloud
Important Capabilities
Capability | Status | Notes |
---|---|---|
Descriptions | ✅ | Enabled by default |
Detect Deleted Entities | ✅ | Enabled via stateful ingestion |
Platform Instance | ✅ | Enabled by default |
Schema Metadata | ✅ | Enabled by default (only for Import Data Models) |
Table-Level Lineage | ✅ | Enabled by default (only for Live Data Models) |
Configuration Notes
Refer to Manage OAuth Clients to create an OAuth client in SAP Analytics Cloud. The OAuth client is required to have the following properties:
- Purpose: API Access
- Access:
- Data Import Service
- Authorization Grant: Client Credentials
Maintain connection mappings (optional):
To map individual connections in SAP Analytics Cloud to platforms, platform instances and environments, the connection_mapping
configuration can be used within the recipe:
connection_mapping:
MY_BW_CONNECTION:
platform: bw
platform_instance: PROD_BW
env: PROD
MY_HANA_CONNECTION:
platform: hana
platform_instance: PROD_HANA
env: PROD
The key in the connection mapping dictionary represents the name of the connection created in SAP Analytics Cloud.
Concept mapping
SAP Analytics Cloud | DataHub |
---|---|
Story | Dashboard |
Application | Dashboard |
Live Data Model | Dataset |
Import Data Model | Dataset |
Model | Dataset |
Limitations
- Only models which are used in a Story or an Application will be ingested because there is no dedicated API to retrieve models (only for Stories and Applications).
- Browse Paths for models cannot be created because the folder where the models are saved is not returned by the API.
- Schema metadata is only ingested for Import Data Models because there is no possibility to get the schema metadata of the other model types.
- Lineages for Import Data Models cannot be ingested because the API is not providing any information about it.
- Currently, only SAP BW and SAP HANA are supported for ingesting the upstream lineages of Live Data Models - a warning is logged for all other connection types, please feel free to open an issue on GitHub with the warning message to have this fixed.
- For some models (e.g., builtin models) it cannot be detected whether the models are Live Data or Import Data Models. Therefore, these models will be ingested only with the
Story
subtype.
CLI based Ingestion
Install the Plugin
The sac
source works out of the box with acryl-datahub
.
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: sac
config:
stateful_ingestion:
enabled: true
tenant_url: # Your SAP Analytics Cloud tenant URL, e.g. https://company.eu10.sapanalytics.cloud or https://company.eu10.hcs.cloud.sap
token_url: # The Token URL of your SAP Analytics Cloud tenant, e.g. https://company.eu10.hana.ondemand.com/oauth/token.
# Add secret in Secrets Tab with relevant names for each variable
client_id: "${SAC_CLIENT_ID}" # Your SAP Analytics Cloud client id
client_secret: "${SAC_CLIENT_SECRET}" # Your SAP Analytics Cloud client secret
# ingest stories
ingest_stories: true
# ingest applications
ingest_applications: true
resource_id_pattern:
allow:
- .*
resource_name_pattern:
allow:
- .*
folder_pattern:
allow:
- .*
connection_mapping:
MY_BW_CONNECTION:
platform: bw
platform_instance: PROD_BW
env: PROD
MY_HANA_CONNECTION:
platform: hana
platform_instance: PROD_HANA
env: PROD
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
client_id ✅ string | Client ID for the OAuth authentication |
client_secret ✅ string(password) | Client secret for the OAuth authentication |
tenant_url ✅ string | URL of the SAP Analytics Cloud tenant |
token_url ✅ string | URL of the OAuth token endpoint of the SAP Analytics Cloud tenant |
incremental_lineage boolean | When enabled, emits lineage as incremental to existing lineage already in DataHub. When disabled, re-states lineage on each run. Default: False |
ingest_applications boolean | Controls whether Analytic Applications should be ingested Default: True |
ingest_import_data_model_schema_metadata boolean | Controls whether schema metadata of Import Data Models should be ingested (ingesting schema metadata of Import Data Models significantly increases overall ingestion time) Default: True |
ingest_stories boolean | Controls whether Stories should be ingested Default: True |
platform_instance string | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details. |
query_name_template string | Template for generating dataset urns of consumed queries, the placeholder {query} can be used within the template for inserting the name of the query Default: QUERY/{name} |
env string | The environment that all assets produced by this connector belong to Default: PROD |
connection_mapping map(str,ConnectionMappingConfig) | Any source that produces dataset urns in a single environment should inherit this class |
connection_mapping. key .envstring | The environment that this connection mapping belongs to Default: PROD |
connection_mapping. key .platformstring | The platform that this connection mapping belongs to |
connection_mapping. key .platform_instancestring | The instance of the platform that this connection mapping belongs to |
folder_pattern AllowDenyPattern | Patterns for selecting folders that are to be included Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} |
folder_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True |
folder_pattern.allow array | List of regex patterns to include in ingestion Default: ['.*'] |
folder_pattern.allow.string string | |
folder_pattern.deny array | List of regex patterns to exclude from ingestion. Default: [] |
folder_pattern.deny.string string | |
resource_id_pattern AllowDenyPattern | Patterns for selecting resource ids that are to be included Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} |
resource_id_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True |
resource_id_pattern.allow array | List of regex patterns to include in ingestion Default: ['.*'] |
resource_id_pattern.allow.string string | |
resource_id_pattern.deny array | List of regex patterns to exclude from ingestion. Default: [] |
resource_id_pattern.deny.string string | |
resource_name_pattern AllowDenyPattern | Patterns for selecting resource names that are to be included Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} |
resource_name_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True |
resource_name_pattern.allow array | List of regex patterns to include in ingestion Default: ['.*'] |
resource_name_pattern.allow.string string | |
resource_name_pattern.deny array | List of regex patterns to exclude from ingestion. Default: [] |
resource_name_pattern.deny.string string | |
stateful_ingestion StatefulStaleMetadataRemovalConfig | Stateful ingestion related configs |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True |
The JSONSchema for this configuration is inlined below.
{
"title": "SACSourceConfig",
"description": "Base configuration class for stateful ingestion for source configs to inherit from.",
"type": "object",
"properties": {
"incremental_lineage": {
"title": "Incremental Lineage",
"description": "When enabled, emits lineage as incremental to existing lineage already in DataHub. When disabled, re-states lineage on each run.",
"default": false,
"type": "boolean"
},
"env": {
"title": "Env",
"description": "The environment that all assets produced by this connector belong to",
"default": "PROD",
"type": "string"
},
"platform_instance": {
"title": "Platform Instance",
"description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details.",
"type": "string"
},
"stateful_ingestion": {
"title": "Stateful Ingestion",
"description": "Stateful ingestion related configs",
"allOf": [
{
"$ref": "#/definitions/StatefulStaleMetadataRemovalConfig"
}
]
},
"tenant_url": {
"title": "Tenant Url",
"description": "URL of the SAP Analytics Cloud tenant",
"type": "string"
},
"token_url": {
"title": "Token Url",
"description": "URL of the OAuth token endpoint of the SAP Analytics Cloud tenant",
"type": "string"
},
"client_id": {
"title": "Client Id",
"description": "Client ID for the OAuth authentication",
"type": "string"
},
"client_secret": {
"title": "Client Secret",
"description": "Client secret for the OAuth authentication",
"type": "string",
"writeOnly": true,
"format": "password"
},
"ingest_stories": {
"title": "Ingest Stories",
"description": "Controls whether Stories should be ingested",
"default": true,
"type": "boolean"
},
"ingest_applications": {
"title": "Ingest Applications",
"description": "Controls whether Analytic Applications should be ingested",
"default": true,
"type": "boolean"
},
"ingest_import_data_model_schema_metadata": {
"title": "Ingest Import Data Model Schema Metadata",
"description": "Controls whether schema metadata of Import Data Models should be ingested (ingesting schema metadata of Import Data Models significantly increases overall ingestion time)",
"default": true,
"type": "boolean"
},
"resource_id_pattern": {
"title": "Resource Id Pattern",
"description": "Patterns for selecting resource ids that are to be included",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"resource_name_pattern": {
"title": "Resource Name Pattern",
"description": "Patterns for selecting resource names that are to be included",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"folder_pattern": {
"title": "Folder Pattern",
"description": "Patterns for selecting folders that are to be included",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"connection_mapping": {
"title": "Connection Mapping",
"description": "Custom mappings for connections",
"default": {},
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/ConnectionMappingConfig"
}
},
"query_name_template": {
"title": "Query Name Template",
"description": "Template for generating dataset urns of consumed queries, the placeholder {query} can be used within the template for inserting the name of the query",
"default": "QUERY/{name}",
"type": "string"
}
},
"required": [
"tenant_url",
"token_url",
"client_id",
"client_secret"
],
"additionalProperties": false,
"definitions": {
"DynamicTypedStateProviderConfig": {
"title": "DynamicTypedStateProviderConfig",
"type": "object",
"properties": {
"type": {
"title": "Type",
"description": "The type of the state provider to use. For DataHub use `datahub`",
"type": "string"
},
"config": {
"title": "Config",
"description": "The configuration required for initializing the state provider. Default: The datahub_api config if set at pipeline level. Otherwise, the default DatahubClientConfig. See the defaults (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19).",
"default": {},
"type": "object"
}
},
"required": [
"type"
],
"additionalProperties": false
},
"StatefulStaleMetadataRemovalConfig": {
"title": "StatefulStaleMetadataRemovalConfig",
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"type": "object",
"properties": {
"enabled": {
"title": "Enabled",
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"default": false,
"type": "boolean"
},
"remove_stale_metadata": {
"title": "Remove Stale Metadata",
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
},
"AllowDenyPattern": {
"title": "AllowDenyPattern",
"description": "A class to store allow deny regexes",
"type": "object",
"properties": {
"allow": {
"title": "Allow",
"description": "List of regex patterns to include in ingestion",
"default": [
".*"
],
"type": "array",
"items": {
"type": "string"
}
},
"deny": {
"title": "Deny",
"description": "List of regex patterns to exclude from ingestion.",
"default": [],
"type": "array",
"items": {
"type": "string"
}
},
"ignoreCase": {
"title": "Ignorecase",
"description": "Whether to ignore case sensitivity during pattern matching.",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
},
"ConnectionMappingConfig": {
"title": "ConnectionMappingConfig",
"description": "Any source that produces dataset urns in a single environment should inherit this class",
"type": "object",
"properties": {
"env": {
"title": "Env",
"description": "The environment that this connection mapping belongs to",
"default": "PROD",
"type": "string"
},
"platform": {
"title": "Platform",
"description": "The platform that this connection mapping belongs to",
"type": "string"
},
"platform_instance": {
"title": "Platform Instance",
"description": "The instance of the platform that this connection mapping belongs to",
"type": "string"
}
},
"additionalProperties": false
}
}
}
Code Coordinates
- Class Name:
datahub.ingestion.source.sac.sac.SACSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for SAP Analytics Cloud, feel free to ping us on our Slack.