WHAT IS A BUNDLE STREAM? |
|
A bundle Datastream is a special kind of Datastream, as it is able to combine data from one or multiple other existing Datastreams. It has multiple applications, but there are 2 main use cases for it:
- Combining multiple separate Datastreams (e.g. different sources).
- Combining multiple extracts into one extract.
Note that previously, the bundle Datastream used to be called "fork", hence all extracts will be prefixed with "fork"-datastreamID, e.g. "fork-123".
PRE-REQUISITES
- No specific Connection is necessary to support a Bundle, but at least one Datastream must already exist on the Workspace to create a Bundle Datastream.
HOW TO CREATE A BUNDLE STREAM
- In the Connect Element, click "+Add Datastream".
- Select Bundle from the list of available Datastreams.
- Configure the Bundle (see below). Be sure to select at least one existing Datastream to include within the Bundle.
- Save the Bundle.
CONFIGURATION
- Workspace:
The Workspace in which you want this Datastream to reside. - Datastreams:
From the drop-down, select all of the individual Datastreams that you wish to include within this Bundle. - Match Options:
There are multiple options which can be selected to find existing extracts in the selected Datastreams, all of which operate in combination with the selected date range. For all options, all extracts (status: collected or status: imported) from the chosen Datastreams will be returned based on the Match Options and the Fetch date range.
- Pattern:
The default match option. If the underlying chosen Datastream has "Manage Extract Names" enabled in its Configuration options, then this can be used in combination with a regular expression pattern that includes placeholders for the date. Extracts will be returned if they match both the regular expression pattern and the fetch date range selected.
Default Regular Expression:
^.*-%Y%m%d.*\.csv$
- Created Date:
Extracts will be returned only where Fetch Date and Creation Date match. - Scheduled Date:
Extracts will be returned if the fetch date and the scheduled date of the extract match. Note that the scheduled date refers to the earliest date contained within an extract, e.g. for an extract which contains 30 days, it will indicate the first day as in the example below.
To find the scheduled or created date, look into the meta data section of the extract.
- Pattern:
- Apply Schema Mapping: This is ticked by default. This will apply schema mapping conventions assigned to the chosen Datastream, meaning that only mapped fields will be shown, and the names as per schema mapping will be shown in the bundle stream.
- This will also automatically combine all mapped columns from different Datastreams into one column, and include two new columns:
"dt_datastream_name" and "dt_datasource" to identify the origin datasource of each row. - If left unticked, all fields from the underlying Datastream will appear in their original names
- In order to harmonize data, the underlying fields should have the same schema mapping applied.
Example:
ga:campaign -> campaign
Campaign -> campaign
CampaignName -> campaign
Read more about Data Schema and Schema Mapping.
- This will also automatically combine all mapped columns from different Datastreams into one column, and include two new columns:
- Concatenate:
This is ticked by default. This will combine all extracts for a fetched time range instead of creating separate extracts per day.
BUNDLING STREAMS FROM PARENT WORKSPACES
By default, a bundle stream will not be able to access data from streams which reside in a parent Workspace. However, if this is needed, the relevant Datastreams can be made available through the Share with Children option.
- Navigate to the relevant Datastream in the parent-level Workspace.
- In the left hand menu screen, under Advanced Settings, click on Other.
- Tick the checkbox 'Share with children'.
- The Datastream is now available for its child Workspaces.
Comments
0 comments
Article is closed for comments.