Snowflake Connector: Timing is Everything

  • 9 November 2023
  • 1 reply
  • 57 views

Userlevel 4
Badge +4

 

Snowflake is one of the fastest growing data platforms in the market, and use of the Snowflake Connector among Gainsight’s clients has been growing rapidly as well. So it’s important that everyone understands how to best use this Connector for accurate data ingests.

When the Snowflake connector runs as scheduled, it checks for data with a timestamp that falls between the current time and the last time the connector was run. The timestamp field is designated within the connector, so you can choose the right field for the connector to use to determine when that row was added or modified. This field is labeled “Reference for last modified date” within the connector.

If that timestamp reflects the actual time that the row was added to Snowflake, everything works as expected.

Example Sequence of Events 1 - Works as Expected

  • The Snowflake connector is configured to use the Date Added field to determine which rows to pull into Gainsight.
  • At May 1 00:00 (midnight), the connector runs and finds no records with a Date Added after April 30 00:00, which is the last time the connector ran.
  • A new row is added to Snowflake with a Date Added of May 1, 12:00 (noon)
  • At May 2 00:00, the connector runs and checks for rows with a Date Added after May 1 00:00, which is the last time the connector ran.
  • The new row is returned because it has a Date Added after May 1 00:00

However in some circumstances, data may be added to Snowflake on a delay. This could result in either no data pulled into Gainsight, or some data being missed. In the example scenario below, the data is first stored in a third party system and only synced to Snowflake once a day.

Example Sequence of Events 2 - Snowflake Updated on a Delay

  • The Snowflake connector is configured to use the Date Added field to determine which rows to pull into Gainsight.
  • At May 1 00:00 (midnight), the connector runs and finds no records with a Date Added after April 30 00:00, which is the last time the connector ran.
  • A new row is created in a third party system with a Date Added of May 1, 12:00 (noon)
  • At May 2 00:00, the connector runs and checks for rows with a Date Added after May 1 00:00, which is the last time the connector ran.
  • No rows are returned because the new row only exists in the third party system, it hasn’t been synced to Snowflake yet.
  • At May 2 01:00, an integration runs to move data from the third party system to Snowflake.
  • As of May 2 01:01, Snowflake has a new row with a Date Added of May 1, 12:00
  • At May 3 00:00, the connector runs and checks for rows with a Date Added after May 2 00:00, which is the last time the connector ran.
  • No rows are returned because the new row has a Date Added prior to May 2.

When troubleshooting why that row wasn’t pulled in via the connector, you can see that there is data matching the timeframe from the run at May 2 00:00 (May 1 00:00 through May 2 00:00). That makes it appear as if the connector isn’t working correctly- however the connector is working as designed. The row in question wasn’t in Snowflake at the time the connector ran for that timeframe.

Even if data is added to Snowflake in real-time, there can be rows that are missed due to time zone differences.

Example Sequence of Events 3 - Time Zone Issue

  • The Snowflake connector is configured to use the Date Added field to determine which rows to pull into Gainsight.
  • At May 2 00:00 Eastern Time, the connector runs and checks for rows with a Date Added after May 1 00:00, which is the last time the connector ran.
  • No rows are returned.
  • At May 2 02:00 Eastern Time, a new record is added to Snowflake. The Date Added is populated with May 1 11:00 because the timestamp is using Pacific Time.
  • At May 3 00:00, the connector runs and checks for rows with a Date Added after May 2 00:00, which is the last time the connector ran.
  • No rows are returned, because the new row does not have a Date Added within that range.

If you are running into either of the issues above, you have a few options.

  1. Select a different timestamp field to use with the connector

In example #2 above, the Date Added field indicated the time that the row was added to the third party system, not to Snowflake. If there was a different field in Snowflake that did reflect the time the row was created in Snowflake, the connector settings could be adjusted to use that field instead.

  1. Instead of using the connector, bring in the data using Rules Engine.

If there isn’t a field that can be used to determine when the rows have been added to Snowflake, you can bring in Snowflake data using Rules Engine rather than the connector. In Rules Engine you can customize the date range that is used to find new records.


1 reply

We had the third scenario, a time zone issue. We resolved it by using a Data Designer combined with a Rule at the suggestion of GS support.  

Reply