CREATING A FILE DATASTREAM
To create a File-type Datastream:
- Navigate through Connect > Connections >+Add New Datastream.
- Select File from the list of available Connections.
- Input the authorization credentials required to access the Datasource File server.
- Configure the Datastream (see File-Specific Configuration Options below).
- Assign the Datastream to a Workspace.
- Most common file types are: CSV, XLS, XLSX
- If the source file is in a .zip/.gzip container, this has to be defined in the File Pattern. Example: ^%Y%m%d.gz$ or ^%Y%m%d.zip$
FILE-SPECIFIC CONFIGURATION OPTIONS
If a fetch finds no available file or valid URL link, a warning will be sent to the user via the 'Issues' pane in the Overview.
- File Pattern:
A regex expression that will be matched to recognize attachment files.
- Zip Match:
The file-type suffix for which Adverity should search any attached .zip files.
- Filename Date Match:
If attached files follow a consistent name format to define e.g. their date of creation, this field defines the filename format that will be used, e.g. "filename-%Y-%m-%d".
- Filename Date Pattern:
If attached files follow a consistent name format to define e.g. their date of creation, this field defines the date expression format that will be used, e.g. "%Y-%m-%d".
- Keep filename:
If ticked, the extract will duplicate the name of the original source file. If the source file is always of the same name, ticking this would cause each extract to also be identically named, and therefore consistently be overwritten as a new file with each new fetch.
- Has Adverity Header
If the Datasource is also an Adverity product, then ticking this will automatically allow the Mailgun Datastream to parse header information in the standard Adverity format.
A drop-down menu of compatible file formats compatible, select the appropriate option for your attachment files, RAW, CSV, EXCEL, AVRO, or PARQUET.
- Source Encoding (RAW, CSV, EXCEL, AVRO, PARQUET):
Select the encoding convention used by your file structure.
- Delimiter (CSV):
A one-character string used to separate fields.
- Quote Char (CSV):
A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters.
- Quoting (CSV):
Controls when quotes should be recognized when parsing files.
- Sheet (EXCEL):
If your attachments are in an excel file that utilizes multiple sheets, specify the one that should be parsed (e.g. "Sheet1", "Conversion Figures").
- Column Offset (EXCEL):
Allows you to skip columns in the sheet that should not be imported, e.g. if Column A contains header information unnecessary to Adverity.
- Row Offset (RAW, CSV, EXCEL, AVRO, PARQUET):
Allows you to skip rows in the sheet that should not be imported, e.g. if Row 1 contains header information unnecessary to Adverity.
- Process all:
Process all files that fit the filter criteria, not just the most recently uploaded.
Search all child directories for matching files to extract, e.g. "\source_directory\sub_directory\" .
- Concatenate files:
Concatenate entries across all files into a single consolidated extract.
- Delete source:
Delete the source file after processing has been successfully completed.
- Move to:
Specify a file directory path in which the extract will be saved "e.g. \extracts\".
- Move to hierarchy:
If a "move to" path has been defined and "Recursive" is ticked, extracts discovered in a child directory will be saved using the same folder structure, e.g. "\extracts\sub_directory\".
Define the master field by which entries will be ordered:
- Filename - Alphabetically listed.
- Modification Time - Listed chronologically based on date of last modification.
- Date Match - Listed chronologically as per the "Filename date match" field.
- Reverse sortorder:
Reverse the list order based on the above (e.g. Filename: A>Z or Z>A).
- Ignore file time:
Disregards the file timestamp, uses filename date match instead.
TIME RANGE OPTIONS
- Time range options are currently not supported for File datasource
- In order to fetch files of a certain date the file name must hold information on the date. See section FETCHING FILES OF A CERTAIN DATE below for further information
FETCHING FILES OF A CERTAIN DATE
- Under Configuration --> File Matching Options --> File pattern use can use %Y, %m, %d as placeholders in the regular expression for the file name
- In this case only files where the name matches the date of the fetch are processed.
- The placeholders always relate to the date of the current fetch. It is currently not possible to define a range of dates that should be fetched
- However, when using Preset: Yesterday or Preset: Today on the Time Range tab the files matching yesterday's or today's date are being processed
- Tick Configuration --> File Processing --> Process all to process all files matching the regular expression
- example: your file pattern is set to filename-%Y-%m-%d. In this case the fetch performed on 01st January 2018 would only fetch files where the name equals filename-2018-01-01
FILENAME DATE MATCH + FILENAME DATE PATTERN
- When populated these two options enable the Datastream to sort file processing based on Sortorder --> Date Match. They are not related to which dates are fetched.
- Instead of using the built-in parsing, the parsing can also be done in a script by using either the command CSV or XLSX. See Import Operations in the Adverity Transformation Reference for full list of parameters and supported file types
- It may be useful to apply the instructions convertnumbers to any column containing metrics to ensure they are imported correctly into your target destination