Salesforce
Important Capabilities
Capability | Status | Notes |
---|---|---|
Data Profiling | ✅ | Only table level profiling is supported via profiling.enabled config field |
Detect Deleted Entities | ❌ | Not supported yet |
Domains | ✅ | Supported via the domain config field |
Extract Tags | ✅ | Enabled by default |
Platform Instance | ✅ | Can be equivalent to Salesforce organization |
Schema Metadata | ✅ | Enabled by default |
Prerequisites
In order to ingest metadata from Salesforce, you will need one of:
- Salesforce username, password, security token
- Salesforce username, consumer key and private key for JSON web token access
- Salesforce instance url and access token/session id (suitable for one-shot ingestion only, as access token typically expires after 2 hours of inactivity)
The account used to access Salesforce requires the following permissions for this integration to work:
- View Setup and Configuration
- View All Data
Integration Details
This plugin extracts Salesforce Standard and Custom Objects and their details (fields, record count, etc) from a Salesforce instance. Python library simple-salesforce is used for authenticating and calling Salesforce REST API to retrive details from Salesforce instance.
REST API Resources used in this integration
- Versions
- Tooling API Query on objects EntityDefinition, EntityParticle, CustomObject, CustomField
- Record Count
Concept Mapping
This ingestion source maps the following Source System Concepts to DataHub Concepts:
Source Concept | DataHub Concept | Notes |
---|---|---|
Salesforce | Data Platform | |
Standard Object | Dataset | subtype "Standard Object" |
Custom Object | Dataset | subtype "Custom Object" |
Caveats
- This connector has only been tested with Salesforce Developer Edition.
- This connector only supports table level profiling (Row and Column counts) as of now. Row counts are approximate as returned by Salesforce RecordCount REST API.
- This integration does not support ingesting Salesforce External Objects
CLI based Ingestion
Install the Plugin
pip install 'acryl-datahub[salesforce]'
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
pipeline_name: my_salesforce_pipeline
source:
type: "salesforce"
config:
instance_url: "https://mydomain.my.salesforce.com/"
username: user@company
password: password_for_user
security_token: security_token_for_user
platform_instance: mydomain-dev-ed
domain:
sales:
allow:
- "Opportunity$"
- "Lead$"
object_pattern:
allow:
- "Account$"
- "Opportunity$"
- "Lead$"
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
access_token string | Access token for instance url |
auth Enum | Default: USERNAME_PASSWORD |
consumer_key string | Consumer key for Salesforce JSON web token access |
ingest_tags boolean | Ingest Tags from source. This will override Tags entered from UI Default: False |
instance_url string | Salesforce instance url. e.g. https://MyDomainName.my.salesforce.com |
is_sandbox boolean | Connect to Sandbox instance of your Salesforce Default: False |
password string | Password for Salesforce user |
platform string | Default: salesforce |
platform_instance string | The instance of the platform that all assets produced by this recipe belong to |
private_key string | Private key as a string for Salesforce JSON web token access |
security_token string | Security token for Salesforce username |
username string | Salesforce username |
env string | The environment that all assets produced by this connector belong to Default: PROD |
domain map(str,AllowDenyPattern) | A class to store allow deny regexes |
domain. key .allowarray | List of regex patterns to include in ingestion Default: ['.*'] |
domain. key .allow.stringstring | |
domain. key .ignoreCaseboolean | Whether to ignore case sensitivity during pattern matching. Default: True |
domain. key .denyarray | List of regex patterns to exclude from ingestion. Default: [] |
domain. key .deny.stringstring | |
object_pattern AllowDenyPattern | Regex patterns for Salesforce objects to filter in ingestion. Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} |
object_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True |
object_pattern.allow array | List of regex patterns to include in ingestion Default: ['.*'] |
object_pattern.allow.string string | |
object_pattern.deny array | List of regex patterns to exclude from ingestion. Default: [] |
object_pattern.deny.string string | |
profile_pattern AllowDenyPattern | Regex patterns for profiles to filter in ingestion, allowed by the object_pattern . Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} |
profile_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True |
profile_pattern.allow array | List of regex patterns to include in ingestion Default: ['.*'] |
profile_pattern.allow.string string | |
profile_pattern.deny array | List of regex patterns to exclude from ingestion. Default: [] |
profile_pattern.deny.string string | |
profiling SalesforceProfilingConfig | Default: {'enabled': False, 'operation_config': {'lower_fre... |
profiling.enabled boolean | Whether profiling should be done. Supports only table-level profiling at this stage Default: False |
profiling.operation_config OperationConfig | Experimental feature. To specify operation configs. |
profiling.operation_config.lower_freq_profile_enabled boolean | Whether to do profiling at lower freq or not. This does not do any scheduling just adds additional checks to when not to run profiling. Default: False |
profiling.operation_config.profile_date_of_month integer | Number between 1 to 31 for date of month (both inclusive). If not specified, defaults to Nothing and this field does not take affect. |
profiling.operation_config.profile_day_of_week integer | Number between 0 to 6 for day of week (both inclusive). 0 is Monday and 6 is Sunday. If not specified, defaults to Nothing and this field does not take affect. |
The JSONSchema for this configuration is inlined below.
{
"title": "SalesforceConfig",
"description": "Any source that is a primary producer of Dataset metadata should inherit this class",
"type": "object",
"properties": {
"env": {
"title": "Env",
"description": "The environment that all assets produced by this connector belong to",
"default": "PROD",
"type": "string"
},
"platform_instance": {
"title": "Platform Instance",
"description": "The instance of the platform that all assets produced by this recipe belong to",
"type": "string"
},
"platform": {
"title": "Platform",
"default": "salesforce",
"type": "string"
},
"auth": {
"default": "USERNAME_PASSWORD",
"allOf": [
{
"$ref": "#/definitions/SalesforceAuthType"
}
]
},
"username": {
"title": "Username",
"description": "Salesforce username",
"type": "string"
},
"password": {
"title": "Password",
"description": "Password for Salesforce user",
"type": "string"
},
"consumer_key": {
"title": "Consumer Key",
"description": "Consumer key for Salesforce JSON web token access",
"type": "string"
},
"private_key": {
"title": "Private Key",
"description": "Private key as a string for Salesforce JSON web token access",
"type": "string"
},
"security_token": {
"title": "Security Token",
"description": "Security token for Salesforce username",
"type": "string"
},
"instance_url": {
"title": "Instance Url",
"description": "Salesforce instance url. e.g. https://MyDomainName.my.salesforce.com",
"type": "string"
},
"is_sandbox": {
"title": "Is Sandbox",
"description": "Connect to Sandbox instance of your Salesforce",
"default": false,
"type": "boolean"
},
"access_token": {
"title": "Access Token",
"description": "Access token for instance url",
"type": "string"
},
"ingest_tags": {
"title": "Ingest Tags",
"description": "Ingest Tags from source. This will override Tags entered from UI",
"default": false,
"type": "boolean"
},
"object_pattern": {
"title": "Object Pattern",
"description": "Regex patterns for Salesforce objects to filter in ingestion.",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"domain": {
"title": "Domain",
"description": "Regex patterns for tables/schemas to describe domain_key domain key (domain_key can be any string like \"sales\".) There can be multiple domain keys specified.",
"default": {},
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/AllowDenyPattern"
}
},
"profiling": {
"title": "Profiling",
"default": {
"enabled": false,
"operation_config": {
"lower_freq_profile_enabled": false,
"profile_day_of_week": null,
"profile_date_of_month": null
}
},
"allOf": [
{
"$ref": "#/definitions/SalesforceProfilingConfig"
}
]
},
"profile_pattern": {
"title": "Profile Pattern",
"description": "Regex patterns for profiles to filter in ingestion, allowed by the `object_pattern`.",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
}
},
"additionalProperties": false,
"definitions": {
"SalesforceAuthType": {
"title": "SalesforceAuthType",
"description": "An enumeration.",
"enum": [
"USERNAME_PASSWORD",
"DIRECT_ACCESS_TOKEN",
"JSON_WEB_TOKEN"
]
},
"AllowDenyPattern": {
"title": "AllowDenyPattern",
"description": "A class to store allow deny regexes",
"type": "object",
"properties": {
"allow": {
"title": "Allow",
"description": "List of regex patterns to include in ingestion",
"default": [
".*"
],
"type": "array",
"items": {
"type": "string"
}
},
"deny": {
"title": "Deny",
"description": "List of regex patterns to exclude from ingestion.",
"default": [],
"type": "array",
"items": {
"type": "string"
}
},
"ignoreCase": {
"title": "Ignorecase",
"description": "Whether to ignore case sensitivity during pattern matching.",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
},
"OperationConfig": {
"title": "OperationConfig",
"type": "object",
"properties": {
"lower_freq_profile_enabled": {
"title": "Lower Freq Profile Enabled",
"description": "Whether to do profiling at lower freq or not. This does not do any scheduling just adds additional checks to when not to run profiling.",
"default": false,
"type": "boolean"
},
"profile_day_of_week": {
"title": "Profile Day Of Week",
"description": "Number between 0 to 6 for day of week (both inclusive). 0 is Monday and 6 is Sunday. If not specified, defaults to Nothing and this field does not take affect.",
"type": "integer"
},
"profile_date_of_month": {
"title": "Profile Date Of Month",
"description": "Number between 1 to 31 for date of month (both inclusive). If not specified, defaults to Nothing and this field does not take affect.",
"type": "integer"
}
},
"additionalProperties": false
},
"SalesforceProfilingConfig": {
"title": "SalesforceProfilingConfig",
"type": "object",
"properties": {
"enabled": {
"title": "Enabled",
"description": "Whether profiling should be done. Supports only table-level profiling at this stage",
"default": false,
"type": "boolean"
},
"operation_config": {
"title": "Operation Config",
"description": "Experimental feature. To specify operation configs.",
"allOf": [
{
"$ref": "#/definitions/OperationConfig"
}
]
}
},
"additionalProperties": false
}
}
}
Code Coordinates
- Class Name:
datahub.ingestion.source.salesforce.SalesforceSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Salesforce, feel free to ping us on our Slack.