insight
As a Data Management Consultant, I often get asked what I believe to be the most important product offering Collibra provides…I always include data lineage in my answer. And for those of you who don’t know, lineage is one of many capabilities that Collibra provides as part of the Collibra Catalog. The company also offers the Collibra Platform, Collibra Governance and Collibra Privacy & Risk.
Data lineage done right provides everyone with a clear and concise story around the end-to-end journey of data in an organisation.
A popular metaphor for data used in the Collibra space is water. Why? Because every organisation should aim for their data to be like water: clear, usable, and available to everyone in the business, quenching the thirst for data insights.
Regardless of whether you choose to use Collibra to represent your lineage, understanding your data lineage can be a huge enabler in allowing those who use data to truly understand and explore it. Quench that thirst: Unlock the power of your lineage now and let your data flow like water!Alvin Useree, Data Management Consultant
What is Data Lineage?
Data lineage simply defines where the data has come from and where it is going. Sign posting all the different systems, business processes and data custodians involved along the way tells a great story of how our data is used in the organisation!
A good understanding of our data’s lineage enables a better understanding of:
With complete visibility of these facets, users of the data can make better decisions around whether the data is fit for purpose, how to access the data and whether they can make changes to the data (as well as who they need to inform beforehand!)
Why Collibra?
Collibra Lineage is my preferred product to represent lineage – the visual aspect of the traceability offering provides a clear and easy to understand representation of the data journey. The interactivity of the diagrams enables us to give users a high-level view of the data and allow them to explore the data to a more granular level – all real time.
The diagrams are flexible enough to be designed to cater to multiple audiences:
- Data Analysts: Individuals who care about the granular story of the data, these individuals are generally interested in the physical lineage
- Non-Technical: Individuals that care about the data’s journey, but care more about the bigger picture and do not want to go too granular (generally logical focussed)
- Business Focused: These are individuals that generally care about how the data in the organisation flows to their business assets (e.g. Metrics, KPIs etc.)
Tip of the day:
You can take regular snapshots of your lineage using the “Pictures” functionality to provide users with a historical view of the data lineage over time – this can tell an awesome story around changes made in the organisation!
So, What Makes a Good Lineage Diagram?
In my opinion, a good lineage diagram should:
- Cater for the Target Audience: Go granular if necessary or stay high level to give a view of the bigger picture
- Utilise Boxed Nodes: Hide any data your end-users do not want to see straight away and teach them how to interact with the diagrams to explore a more granular view
- Leverage Filters Where Appropriate: You can segment your views for certain business use cases E.g. you can have a GPDR and BCBS 239 view using filters
- Use Overlays: Overlay your lineage diagrams with additional content to help contextualise the journey of the data E.g. adding an overlay to identify the security classification of a column
- Consider Loading Times: A lineage diagram that take 1 hour to load provides little benefit to the business. Think about how you can reduce the size of the diagram, perhaps utilise the new lazy load feature or take a snapshot of the diagram
I’ve created an example lineage use case that takes into consideration the target audience and aims to answer some of those difficult questions around the data.
Technical Lineage Example:
Technical lineage simply refers to the tangible data held in an organisation and the data that holds it together. Think Databases, Tables, Columns, APIs etc.
Target Audience: The audience for this diagram is Data Analysts. These are the individuals that we expect to actually use the data and can make use of a more granular diagram.
Questions Answered:
- Who owns the data?
- Who do I need to talk to about getting access to it?
- How useful has it been for other analysts?
- What is the security classification of the data?
- Where is the data currently being hosted?
- How are other analysts currently making use of the data?
- What APIs are available to query the data?
Conclusion
Regardless of whether you choose to use Collibra to represent your lineage, understanding your data lineage can be a huge enabler in allowing those who use data to truly understand and explore it.
Quench that thirst: Unlock the power of your lineage now and let your data flow like water!