Skip to main content
Version: Next

Salesforce

Incubating

Important Capabilities

CapabilityStatusNotes
Data ProfilingOnly table level profiling is supported via profiling.enabled config field
Detect Deleted EntitiesNot supported yet
DomainsSupported via the domain config field
Extract TagsEnabled by default
Platform InstanceCan be equivalent to Salesforce organization
Schema MetadataEnabled by default

Prerequisites

In order to ingest metadata from Salesforce, you will need one of:

  • Salesforce username, password, security token
  • Salesforce username, consumer key and private key for JSON web token access
  • Salesforce instance url and access token/session id (suitable for one-shot ingestion only, as access token typically expires after 2 hours of inactivity)

The account used to access Salesforce requires the following permissions for this integration to work:

  • View Setup and Configuration
  • View All Data

Integration Details

This plugin extracts Salesforce Standard and Custom Objects and their details (fields, record count, etc) from a Salesforce instance. Python library simple-salesforce is used for authenticating and calling Salesforce REST API to retrive details from Salesforce instance.

REST API Resources used in this integration

Concept Mapping

This ingestion source maps the following Source System Concepts to DataHub Concepts:

Source ConceptDataHub ConceptNotes
SalesforceData Platform
Standard ObjectDatasetsubtype "Standard Object"
Custom ObjectDatasetsubtype "Custom Object"

Caveats

  • This connector has only been tested with Salesforce Developer Edition.
  • This connector only supports table level profiling (Row and Column counts) as of now. Row counts are approximate as returned by Salesforce RecordCount REST API.
  • This integration does not support ingesting Salesforce External Objects

CLI based Ingestion

Install the Plugin

pip install 'acryl-datahub[salesforce]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

pipeline_name: my_salesforce_pipeline
source:
type: "salesforce"
config:
instance_url: "https://mydomain.my.salesforce.com/"
username: user@company
password: password_for_user
security_token: security_token_for_user
platform_instance: mydomain-dev-ed
domain:
sales:
allow:
- "Opportunity$"
- "Lead$"

object_pattern:
allow:
- "Account$"
- "Opportunity$"
- "Lead$"

sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
access_token
string
Access token for instance url
auth
Enum
Default: USERNAME_PASSWORD
consumer_key
string
Consumer key for Salesforce JSON web token access
ingest_tags
boolean
Ingest Tags from source. This will override Tags entered from UI
Default: False
instance_url
string
Salesforce instance url. e.g. https://MyDomainName.my.salesforce.com
is_sandbox
boolean
Connect to Sandbox instance of your Salesforce
Default: False
password
string
Password for Salesforce user
platform
string
Default: salesforce
platform_instance
string
The instance of the platform that all assets produced by this recipe belong to
private_key
string
Private key as a string for Salesforce JSON web token access
security_token
string
Security token for Salesforce username
username
string
Salesforce username
env
string
The environment that all assets produced by this connector belong to
Default: PROD
domain
map(str,AllowDenyPattern)
A class to store allow deny regexes
domain.key.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
domain.key.allow.string
string
domain.key.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
domain.key.deny
array
List of regex patterns to exclude from ingestion.
Default: []
domain.key.deny.string
string
object_pattern
AllowDenyPattern
Regex patterns for Salesforce objects to filter in ingestion.
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
object_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
object_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
object_pattern.allow.string
string
object_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
object_pattern.deny.string
string
profile_pattern
AllowDenyPattern
Regex patterns for profiles to filter in ingestion, allowed by the object_pattern.
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
profile_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
profile_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
profile_pattern.allow.string
string
profile_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
profile_pattern.deny.string
string
profiling
SalesforceProfilingConfig
Default: {'enabled': False, 'operation_config': {'lower_fre...
profiling.enabled
boolean
Whether profiling should be done. Supports only table-level profiling at this stage
Default: False
profiling.operation_config
OperationConfig
Experimental feature. To specify operation configs.
profiling.operation_config.lower_freq_profile_enabled
boolean
Whether to do profiling at lower freq or not. This does not do any scheduling just adds additional checks to when not to run profiling.
Default: False
profiling.operation_config.profile_date_of_month
integer
Number between 1 to 31 for date of month (both inclusive). If not specified, defaults to Nothing and this field does not take affect.
profiling.operation_config.profile_day_of_week
integer
Number between 0 to 6 for day of week (both inclusive). 0 is Monday and 6 is Sunday. If not specified, defaults to Nothing and this field does not take affect.

Code Coordinates

  • Class Name: datahub.ingestion.source.salesforce.SalesforceSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Salesforce, feel free to ping us on our Slack.