S3 Best Practices
Sample job creation:
*S3 Dataset Task in Bionic Rules:
https://support.gainsight.com/SFDC_Edition/Rules_Engine/Admin_Guides/S3_Dataset_Task_in_Bionic_Rules
*Use File Analyzer to proactively identify and resolve any issues in the file.
S3 Job Recommendations:
-
Use Bionic rules for data ingestion (for better error logging and/ or for further transformations before loading to target object)
-
Enable archiving for future reference
-
Enable post file upload to trigger rule - ‘Event’ based scheduling
-
Ensure a minimum gap of two hours between file uploads.
-
Whenever possible, break large files to chunks of 100MB for easy debugging incase of errors
-
Use notification emails for the success/ failed jobs.
File Naming conventions:
-
Avoid spaces in the file names
-
File name given in the job configuration and the file loaded in the S3 bucket should be same
-
Check for case sensitivity
File Properties:
-
Ensure that file properties given in the job match with the file
-
Field Separator (comma, pipe, semicolon, space, tab)
-
Text Qualifier (Double Quote, Single Quote)
-
Escape Character (Backslash, Double Quote, Single Quote)
-
Compression Type (None, bzip, gzip)
-
Character Encoding (UTF-8, UTF-16, UTF-16BE, ASCII, ISO-8859-1, ISO-8859-2, ISO-8859-3)
-
Make sure that encoding is not in BOM
-
Field Mappings:
-
Source CSV field should match exactly with CSV header names (case sensitive)
-
AVOID having leading or trailing spaces in CSV header names
-
AVOID placing special characters in CSV header names (unless required)
-
AVOID having duplicate CSV header names
-
For Decimal Number fields mention the number of decimal places to be retained
-
For Date or DateTime type fields specify the required date format using the gear icon
-
Recommended DATE format MM-dd-yyyy
-
Recommended DATETIME format MM-dd-yyyy HH:mm or MM-dd-yyyy HH:mm:ss
-
-
Update the mapping immediately if you’re making changes to the existing field labels in the source or in the target object
Data load Operations:
-
Identifier keys not required for INSERT
-
Identifier keys are mandatory for UPDATE and UPSERT jobs
-
Make sure that the identifier keys are always populated
-
Use standard Id fields and date fields for identifiers as a best practice
Dos and Don’ts:
-
Do not send empty files for data ingestion, triggers unnecessary jobs/ emails
-
Do not send files without headers
-
Do not send duplicate records in a file even if you’re using Update or Upsert
-
Identifiers should never be NULL, if you’re using multiple identifiers make sure that all of them are properly populated
-
Use text qualifiers (“ or ‘) as much as possible - this makes sure that file processing doesn’t break upon encountering delimiters or escape characters or special characters within the text
-
Make sure that date format is consistent across a field (or column)
-
If you’re passing long/ rich text values - ensure that text qualifiers are properly handled
-
Use escape characters wherever possible
-
For picklist values, make sure that they are already available under existing dropdown list categories