What is DI?
What is Data integration? Typically this is described as the combination of technical and business processes used to combine data from disparate data sources into meaningful unified information. It is also the movement and sharing of digital information between systems both internal and external to an organisation. This is a very common use case for cloud based applications where information needs to be shared between applications both in the cloud and on premise.
Related techniques or technologies associated with DI are Extract Transform Load (ETL) Extract Load Transform (ELT) Enterprise Service Bus (ESB), Messaging platforms, Kafka, CDC (Change Data Capture), Data Virtualisation, IOT (Internet of Things) and many more.
A complete data integration solution delivers trusted data from various sources to support a business-ready data pipeline for DataOps, with DatOps being the management and control of those data flows and processes and thereby provide users with a real-time view of business performance.
Data integration has been around for many, many years and has been evolving with the technology advancement but at its HEART it is a strategy being the first steps in transforming data into both meaningful and valuable information.
DI – A Brief History
Modern Data Integration started with the advent of the Data Warehouse, it was the first time that we tried to integrate data from all of the different systems across an orgnisation. The traditional procedural approach of scripting and coding was too slow and cumbersome for creating all of the integration data flows. This led to the creation of the modern data centric ETL tools that provided quick virtually codeless data flows that enabled us to quickly develop and deploy scalable data flows.
While ETL tools continue to be key, other tools and approaches have been created to address the growing challenges of data integration. We now have messaging platforms, change data capture and data virtualisation to name a few.
Modern Integration solutions need to able to natively connect to a broad range of applications and data sources and be able to transform / convert many data formats delivering trusted data. Deliver both full load and incremental data extraction. Handle change and late arriving / out of sequence data. Be scalable to be able to consume continually growing data volumes and deliver the expectations of real-time data availability.
DI – Use Cases
As the demands of organisations have changed over the years so have the capabilities of Data Integration tools and techniques. Data Integration has never been more important:
- It underpins the digital transformation projects that many organisations are undertaking to improve the agility and accessibly of their systems.
- Data Warehousing, Data Lakes,AI and data science all require trusted unified views of an organisations.
- It is fundamental with Data Governance to AI, without trusted Data Integration there is no AI.
- The world of cloud-based enterprise applications is enabled by Data Integration enabling organisations to integration both on and off premise applications and intra cloud applications.
- A good data integration platform mitigates risk from vendor / application lock in by reducing the risk / impact of changing applications. We have many customers who Data Integration capability has out lived many of their core applications and been a fundamental tool in their migration.
Where is Integrations used? The list is somewhat endless but typically you will see;
- Data acquisition and unification for analytics and business intelligence (BI), Data Lake and Data Warehouses
- Integrating Cloud based applications.
- Support for Data governance and management of data assets
- Sourcing and delivery of master data in support of master data management (MDM) or Operational Data Stores (ODS)
- Data consistency between operational (business) applications
- Business Continuity
- Digital Transformation
- Inter-enterprise data sharing