S3 Data Ingestion - Best Practices

phaneendhar_lingam · 2020-04-23T06:33:11+00:00

S3 Best Practices Sample job creation: *S3 Dataset Task in Bionic Rules: https://support.gainsight.com/SFDC_Edition/Rules_Engine/Admin_Guides/S3_Dataset_Task_in_Bionic_Rules *Use File Analyzer to proactively identify and resolve any issues in the file. https://support.gainsight.com/Gainsight_NXT/Gainsight_Analyzer/Admin_Guides/File_Analyzer_Admin_Guide S3 Job Recommendations: Use Bionic rules for data ingestion (for better error logging and/ or for further transformations before loading to target object) Enable archiving for future reference Enable post file upload to trigger rule - ‘Event’ based scheduling Ensure a minimum gap of two hours between file uploads. Whenever possible, break large files to chunks of 100MB for easy debugging incase of errors Use notification emails for the success/ failed jobs. File Naming conventions: Avoid spaces in the file names File name given in the job configuration and the file loaded in the S3 bucket should be same Check for case sensitivity File Properties: Ensure that file properties given in the job match with the file Field Separator (comma, pipe, semicolon, space, tab) Text Qualifier (Double Quote, Single Quote) Escape Character (Backslash, Double Quote, Single Quote) Compression Type (None, bzip, gzip) Character Encoding (UTF-8, UTF-16, UTF-16BE, ASCII, ISO-8859-1, ISO-8859-2, ISO-8859-3) Make sure that encoding is not in BOM Field Mappings: Source CSV field should match exactly with CSV header names (case sensitive) AVOID having leading or trailing spaces in CSV header names AVOID placing special characters in CSV header names (unless required) AVOID having duplicate CSV header names For Decimal Number fields mention the number of decimal places to be retained For Date or DateTime type fields specify the required date format using the gear icon Recommended DATE format MM-dd-yyyy Recommended DATETIME format MM-dd-yyyy HH:mm or MM-dd-yyyy HH:mm:ss Update the mapping immediately if you’re making changes to the existing field labels in the source or in the target object Data load Operations: Identifier keys not required for INSERT Identifier keys are mandatory for UPDATE and UPSERT jobs Make sure that the identifier keys are always populated Use standard Id fields and date fields for identifiers as a best practice Dos and Don’ts: Do not send empty files for data ingestion, triggers unnecessary jobs/ emails Do not send files without headers Do not send duplicate records in a file even if you’re using Update or Upsert Identifiers should never be NULL, if you’re using multiple identifiers make sure that all of them are properly populated Use text qualifiers (“ or ‘) as much as possible - this makes sure that file processing doesn’t break upon encountering delimiters or escape characters or special characters within the text Make sure that date format is consistent across a field (or column) If you’re passing long/ rich text values - ensure that text qualifiers are properly handled Use escape characters wherever possible For picklist values, make sure that they are already available under existing dropdown list categories

4 years ago
23 April 2020
2 replies
585 views

+1

phaneendhar_lingam
Veteran Rookie
14 replies

S3 Best Practices

Sample job creation:

V2ZMwHD5-ZcsPJ38fXfY4C75N3e0WjVFRPmhl0pLpoqbXBiDpRhNFoctSgYjFVdKhjqXJGCybA3dgxNPb6HTEL2RMhRv5m2wHm-6YjR18Elml7X2H62piDNntLSJayd07AUTPos-

*S3 Dataset Task in Bionic Rules:

https://support.gainsight.com/SFDC_Edition/Rules_Engine/Admin_Guides/S3_Dataset_Task_in_Bionic_Rules

*Use File Analyzer to proactively identify and resolve any issues in the file.

https://support.gainsight.com/Gainsight_NXT/Gainsight_Analyzer/Admin_Guides/File_Analyzer_Admin_Guide

S3 Job Recommendations:

Use Bionic rules for data ingestion (for better error logging and/ or for further transformations before loading to target object)
Enable archiving for future reference
Enable post file upload to trigger rule - ‘Event’ based scheduling
Ensure a minimum gap of two hours between file uploads.
Whenever possible, break large files to chunks of 100MB for easy debugging incase of errors
Use notification emails for the success/ failed jobs.

File Naming conventions:

Avoid spaces in the file names
File name given in the job configuration and the file loaded in the S3 bucket should be same
Check for case sensitivity

File Properties:

Ensure that file properties given in the job match with the file
- Field Separator (comma, pipe, semicolon, space, tab)
- Text Qualifier (Double Quote, Single Quote)
- Escape Character (Backslash, Double Quote, Single Quote)
- Compression Type (None, bzip, gzip)
- Character Encoding (UTF-8, UTF-16, UTF-16BE, ASCII, ISO-8859-1, ISO-8859-2, ISO-8859-3)
- Make sure that encoding is not in BOM

Field Mappings:

Source CSV field should match exactly with CSV header names (case sensitive)
AVOID having leading or trailing spaces in CSV header names
AVOID placing special characters in CSV header names (unless required)
AVOID having duplicate CSV header names
For Decimal Number fields mention the number of decimal places to be retained
For Date or DateTime type fields specify the required date format using the gear icon
- Recommended DATE format MM-dd-yyyy
- Recommended DATETIME format MM-dd-yyyy HH:mm or MM-dd-yyyy HH:mm:ss
Update the mapping immediately if you’re making changes to the existing field labels in the source or in the target object

Data load Operations:

Identifier keys not required for INSERT
Identifier keys are mandatory for UPDATE and UPSERT jobs
Make sure that the identifier keys are always populated
Use standard Id fields and date fields for identifiers as a best practice

Dos and Don’ts:

Do not send empty files for data ingestion, triggers unnecessary jobs/ emails
Do not send files without headers
Do not send duplicate records in a file even if you’re using Update or Upsert
Identifiers should never be NULL, if you’re using multiple identifiers make sure that all of them are properly populated
Use text qualifiers (“ or ‘) as much as possible - this makes sure that file processing doesn’t break upon encountering delimiters or escape characters or special characters within the text
Make sure that date format is consistent across a field (or column)
If you’re passing long/ rich text values - ensure that text qualifiers are properly handled
Use escape characters wherever possible
For picklist values, make sure that they are already available under existing dropdown list categories

2 replies

A

+1

aparimala
Team Captain
89 replies
3 years ago
30 April 2020

I wish I saw this two weeks ago! We are in the process of migrating to rules engine to load data from s3 data task instead of connector 1.0. If this was in a wizard for migration it saves admins a lot of time when they encounter issues or miss steps.

Userlevel 6

+4

jean.nairon
All Star
421 replies
3 years ago
10 July 2020

If you are running into issues where your file will not load because of a partitioning issue, it’s likely that your file has improperly formatted quotations and commas.

I have processed a lot of S3 files and the most common issue I have found is if there are quotations in your data. For example, I ran into a contact record where someone had entered the following in the full name field: John Smith aka “Smithy”

When exporting this to a CSV with quotations around the field, it appeared like this: “John Smith aka “Smithy””,

Two double quotes followed by a comma will create a partitioning issue.

Another known challenge that can easily happen is with backslashes. If a field includes a backslash like Account Name = AlphaRomeo, it can cause issues.

If you are having issues loading your file, make sure you don’t have two double quotes followed by a coma and also check for the backslashes. Hope this helps!

Reply

Didn't find what you were looking for?

Need a navigation rescue?

Sign up

Welcome to the Gainsight Community

Scanning file for viruses.

This file cannot be downloaded