Yes. Just last bigger project was an eye-opener for me. All of a sudden, I can't even trust basic info provided, because so many don't even check what they have sent to me. If you understand the underpinnings, you are a lot more useful than 'prompt engineer' ( quotations intended ).
Nope. Data modeling is inherent to having information systems.
The reason the author found that data modeling is 'dead' is that the Modern Data Stack promised that you could transform your data later, and so many people never got around to that. Long live the data swamp!
I would say that the easy access to previously unthinkable amount of storage and compute (and obv network throughput to tie it together) is thought to make the data modeling unnecessary. Normalized/denormalized data models, Inman/Kimbal architectures were largely dictated by limits of compute and storage which are no longer relevant.
What is forgotten is the data governance and the data quality, which results in, yes, data swamps as far as the eye can see and hordes of "data scientists" roaming around hoping to find actionable "gems".
Well thought, sophisticated ways of modeling data for analytics purposes -using established approaches - are being replaced by just pulling data from the data sources - with barely any change in the source structure - into cloud data platforms.
In the past we used to model layers in a data-warehousing infrastructure each with a purpose and a data modelling methodology. For instance, an operational data store (ODS) layer, integrating data from all the sources, with a normalized data structure. Then a set of datamarts, each of them containing a subset of the ODS content, in a denormalized format, focused each on a specific functional domain.
We had rules, methods to structure data in order to get performant reporting, and a customer orientation.
Coming from this world, it seems like data governance principles are gone, and it feels like some organisations use the modern data stack same way as each analyst would be doing their own Excel files in their own corner, without any safeguards.
The core issue is dimensional data modeling was introduced to address limitations on hardware (disk drives) and limited capacity.
With the advent of unlimited storage and separation of computer and storage, dimensional data modeling would only be possible if there was strong data governance in a system like SAP or a COE.
AI doesn't replace data modeling, it makes it way more important, useful and easy to do.
Yes. Just last bigger project was an eye-opener for me. All of a sudden, I can't even trust basic info provided, because so many don't even check what they have sent to me. If you understand the underpinnings, you are a lot more useful than 'prompt engineer' ( quotations intended ).
Nope. Data modeling is inherent to having information systems.
The reason the author found that data modeling is 'dead' is that the Modern Data Stack promised that you could transform your data later, and so many people never got around to that. Long live the data swamp!
Every bucket of data is implicitly or explicitly the result of an act of data modeling, some more intentional than others.
I would say that the easy access to previously unthinkable amount of storage and compute (and obv network throughput to tie it together) is thought to make the data modeling unnecessary. Normalized/denormalized data models, Inman/Kimbal architectures were largely dictated by limits of compute and storage which are no longer relevant.
What is forgotten is the data governance and the data quality, which results in, yes, data swamps as far as the eye can see and hordes of "data scientists" roaming around hoping to find actionable "gems".
I’m not sure I follow, though I like the tone. What has data modeling been replaced by?
Not OP, but in a similar boat. My 2 cents:
Well thought, sophisticated ways of modeling data for analytics purposes -using established approaches - are being replaced by just pulling data from the data sources - with barely any change in the source structure - into cloud data platforms.
In the past we used to model layers in a data-warehousing infrastructure each with a purpose and a data modelling methodology. For instance, an operational data store (ODS) layer, integrating data from all the sources, with a normalized data structure. Then a set of datamarts, each of them containing a subset of the ODS content, in a denormalized format, focused each on a specific functional domain.
We had rules, methods to structure data in order to get performant reporting, and a customer orientation.
Coming from this world, it seems like data governance principles are gone, and it feels like some organisations use the modern data stack same way as each analyst would be doing their own Excel files in their own corner, without any safeguards.
Throwing shit at the wall, mostly. "Here's a S3 bucket of line separated .json blobs that have a consistent format sometimes! Good luck!"
By vibe graphing, probably.
The core issue is dimensional data modeling was introduced to address limitations on hardware (disk drives) and limited capacity.
With the advent of unlimited storage and separation of computer and storage, dimensional data modeling would only be possible if there was strong data governance in a system like SAP or a COE.