1. Introduction
This tutorial guides you through the process of transferring your data to Databricks to store and further process information.
To transfer data from a Workspace to Databricks
- Add Databricks as a Destination to the Workspace.
- Assign the Databricks Destination to the Workspace.
When you add a Destination, you save the information about how to connect to it. You can add as many Destinations to a Workspace as you want, but you can only select one of the added Destinations to receive your data. After you add a Destination to a Workspace, it is available in each Workspace lower in the hierarchy.
When you assign a Destination to Workspace, each time data is fetched for a Datastream in this Workspace (for which you have enabled the Destination option), it is transferred to Databricks. You cannot assign more than one Destination to a Workspace.
2. Prerequisites
Before you add Databricks as a Destination to your workspace, perform all of the following actions:
- Set up instance profiles for access to S3 buckets from Databricks clusters. For more information about this step, consult the Databricks documentation.
- Obtain an Access Key ID and a Secret Access Key. Use an AWS policy file as you would for an AWS S3 Destination.
3. Add Databricks as Destination
To add Databricks as a Destination to a workspace:
- In Adverity, select the Transfer element.
- Click the + Add button.
- Select Databricks.
- Choose one of the following options:
- Select Setup a new connection to authorize the new connection with your own credentials.
- Select Send an access request to ask someone else to authorize the new connection.
3.1. Authorize the new connection with your own credentials
To authorize the new connection with your own credentials:
- In the Connection page, fill in the following fields:
Field name |
Description |
Personal Access Token |
The personal access token generated in Databricks. For more information about this step, consult the Databricks documentation. |
Delta Lake Instance |
The address of the Delta Lake instance to which you want to connect. |
S3 Bucket |
The address of the S3 bucket that you set up for Databricks. |
Access Key ID |
The Access Key ID with which Adverity can access the S3 bucket. |
Secret Access Key |
The Secret Access Key with which Adverity can access the S3 bucket. |
Instance Profile ARN |
The Instance Profile ARN of the instance profile that you set up to access the S3 bucket. |
- Click Authorize.
- In the Configuration page, fill in the following fields:
Field name |
Description |
Workspace |
Select the Workspace to which you want to add the new Destination. |
Name |
Specify the name of the new Destination. |
Connection |
Choose the Connection you set up on the previous page. |
Database |
Specify the name of the Databricks database where to transfer the data. |
Partition by date |
(Recommended) If enabled, the target table is partitioned by a date column, and data is only replaced based on the date. This means that if you import data to a table where data already exists for some dates, the data for these dates is overwritten and the data for other dates remains unchanged. This option only has an effect if you also enable Datastream►Local Data Retention►Extract Filenames►Unique by day. |
Target Table Names |
(Optional) You can specify a Target Table within the database for each Datastream. Adverity will transfer data from the Datastream to its Target Table. If you leave this field empty, Adverity will save data from each Datastream in a different table named datastreamtype_datastreamID (for example, mailgun_83). To specify the target table for a Datastream:
You can specify the same Target Table for several Datastreams. If a column is shared between Datastreams, Adverity will perform a full outer join and concatenate values. If a column is not shared between Datastreams, Adverity will write null values in the relevant cells. |
- Click Next.
- (Optional) In the Assignment page, assign the Databricks Destination to the Workspace by clicking Assign. Or you can click Skip.
3.2. Ask someone else to authorize the new connection
To ask someone else to authorize the new connection:
- In the Email field, write the e-mail address of the person who you want to ask to authorize the new connection.
- (Optional) Customize the message and set notification preferences.
- Click Send Access Request.
4. Assign Databricks as Destination
You can assign the Databricks Destination to the Workspace in the Assignment page of the process explained in section 3.1.
Alternatively, to assign the Databricks Destination to a workspace:
- In Adverity, click the Workspace Settings icon.
- Click Administration.
- In Workspace►Settings►Destination, select the Databricks Destination.
Comments
0 comments
Article is closed for comments.