Building Enterprise Data Warehouse in Google Cloud using Low Code
One of Google Cloud customers is building a data warehouse in the Google Cloud by ingesting data from enterprise applications such as NetSuite, Workday, Harvest, Smartsheet, CSV files, etc. An analytics team connects to the warehouse using BI tools such as Tableau for analysis.
The use case presented several challenges:
- The pipeline should support enterprise security and governance policies.
- The system should provide flexible scheduling up to the table level.
- Some tables required custom handling for exceptional cases, such as handling physically deleted records.
- Flexibility to incorporate logic in the pipeline.
- A user interface to correct and add missing data.
Integration Platform as a Service
The initial goal was to use an Integration Platform such as StichData, and Fivetran. Soon, we realized that they limited the flexibility for scripting specific requirements.
Trillo Workbench for Integration
Trillo Workbench, a server-side low code platform on the Google Platform, met all requirements — primarily due to its programmability using scripts. Trillo Workbench provides the following relevant services:
- Trillo Workbench can connect with external applications using OAuth2, OAuth1, JDBC for on-premise databases, and SFTP for file transfer.
- Trillo Workbench can automatically infer schema and auto-create it in the data warehouse database (Cloud SQL and BigQuery). It simultaneously creates a JSON file for the metadata of the database tables. Users can modify this file to fine-tune the tables, such as specifying the length of the column, change type, uniqueness, indexing, etc.
- Trillo Workbench provides APIs to pull paginated data from the source system and batch writes into the destination tables.
- Trillo Workbench provides scripting capability to write code using its APIs to pull data, transform it and write it into destination tables. The flexibility to program pipelines using 100–200 lines of code is a powerful feature to handle special requirements.
- Trillo Workbench can manage credentials securely. It can layer role-based access control on top of the data in the warehouse.
- Trillo Workbench provides a scheduler and workflow to run the pipeline at a required interval.
- Trillo Workbench provides audit logs and alerts for monitoring the pipeline.
- Trillo Workbench provides UI to insert and edit records.
- Trillo Workbench schedules queries and stored procedures to create materialized views or data mart tables.
The following diagram shows the resulting architecture of the pipeline.
Conclusion
In phase 1, the pipeline is operational for the Harvest, Workday, NetSuite, Smartsheet, and CSV files. In phase 2, the customer will integrate several other systems.
Trillo Workbench provides a low code (hence low cost and less time) option for ingesting data from enterprise applications and files into a data warehouse in Google.