This FAQ article answer questions related to the data available within Customer Communities (CC) Data Lake.
- For more information about connecting Business Intelligence (BI) tools to CC Data Lake, refer to the dedicated KB articles.
- For more information related to guidelines on starting data analysis, refer to Getting Started with Data Analytics in CC Data Lake article
The FAQs below will provide you with valuable information and insights, ensuring you have a comprehensive understanding of the data available to you. If you have more questions about this topic, please don't hesitate to post them below.
What does it mean that the Customer Communities Data Lake Silver Layer has Normalized Data?
It means that the data within the Silver Layer has been organized and structured according to the principles of data normalization. Here are several key characteristics and practices:
- Minimized Data Redundancy: In the Silver Layer, we tried to reduce data redundancy. Data redundancy occurs when the same information is stored in multiple places within a database, leading to inconsistencies and data anomalies. In a normalized data structure, redundant data is reduced or eliminated, improving data consistency.
- Multiple Related Tables: Data in the Silver Layer is typically distributed across multiple related tables. We designed each table to focus on a specific type of data or entity. We establish relationships between these entities using foreign keys, which link the data across various tables. This structuring approach helps maintain data integrity and consistency.
- Efficient Updates: We designed normalized data in the Silver Layer to make data updates and modifications more efficient. When a specific piece of information needs to be updated, it only needs to be updated in one place, reducing the risk of data inconsistencies and simplifying data maintenance.
- More Complex Queries: While normalization reduces redundancy and ensures data integrity, it often results in more complex SQL queries for data retrieval. As the data spreads across multiple tables, it raises query complexity, and queries frequently involve joins to retrieve related information.
What are the challenges when using Normalized Data for Analytics?
- Analytical queries on normalized data can be complicated. To extract meaningful insights, you often need to perform multiple table joins, subqueries, and aggregations. Writing and optimizing these complex SQL queries require advanced skills.
- Analytical queries on normalized data can be slower than denormalized data due to the need for multiple joins. Enhancing query performance requires expertise in indexing, query optimization, and database tuning.
- Analyzing normalized data demands a deep understanding of data relationships and the database schema, making it more challenging for analysts not intimately familiar with the data structure. For more information, refer to this article.
Why does CC not offer denormalized Data in the Customer Communities Data Lake Silver Layer?
- CH Data Lake Silver Layer is a step in the full-blown CH Data Lake project.
- We designed the Silver Layer to maintain data integrity and efficiency. At the same time, the planned Golden Layer will cater to the specific needs of analytics and reporting by offering denormalized data for improved query performance and simplified query construction.
Which BI tools can you connect to the CC Data Lake Silver Layer?
Any BI tool that can connect to AWS Athena. For your convenience, we provide you with some guidance and dedicated connectors for:
- Power BI
- Tableau
- Looker
Are there any limits on the volume of data processed or the number of queries?
We have established a limit of 500MB of data scanned for each individual query. Please contact us if you experience any challenges in querying data because of these limitations.
Is data in the CC Data Lake Silver Layer complete?
The CC Data Lake Silver Layer contains all the most relevant data objects for analytics. As our platform evolves, we will keep adding new data objects to the Data Lake. We encourage you to contact us if any data objects are missing in Data Lake. Your inquiries regarding data availability are welcome, and we are here to assist you.
Where can I find the data catalog?
The data catalog is available here.
Where can I find the CC Data Lake Silver Layer data model?
The visualization of the data model is available here [password to open the document: GSCCDataModel ]