Revisiting the principles of data-oriented programming

blog.klipse.tech

126 points by viebel 3 years ago

bob1029 3 years ago

I believe the easiest way to think about it is to get away from your programming tools and to start modeling your problem domain as tables in excel.

Once you have a relational schema that the business can look at and understand, then you go implement it with whatever tools and techniques you see fit.

This is what “data-oriented” programming means to me. It’s not some elegant code abstraction. It’s mostly just a process involving people and business.

Even for non serious business, these techniques can wrangle complexity that would otherwise be insurmountable.

I still think the central piece of magic is embracing a relational model. This allows for things like circular dependencies to be modeled exactly as they are in reality.

jackosdev 3 years ago

This confused me as when I hear data-orientated I think data structures that are optimised around minimising CPU cache misses by using better alignments, using enums where possible, not storing results of simple calculations etc. There is a popular book and popular talks on the subject. Probably confuses other people as well I'd imagine
- bob1029 3 years ago
  
  > I think data structures that are optimised around minimising CPU cache misses by using better alignments, using enums where possible, not storing results of simple calculations etc.
  You might be surprised to learn that modeling your problem in terms of normalized relational tables ultimately achieves similar objectives. The more normalized, the more packed you will find their in-memory representations.
- viebel 3 years ago
  
  DOP is not the same as DOD [1]
  1: https://blog.klipse.tech/visualization/2021/02/16/data-relat...

alphanumeric0 3 years ago

I'm having a hard time thinking of a way code can ever be fully decoupled from data. When we decide it's better to have a name field rather than firstName and lastName does that mean we simplify NameCalculation.fullName to just return data.name? This seems to suggest we still have coupled code to data (the data structure being an object), it's just now a coupled function, but you have decoupled it enough to use NameCalculation in different contexts. Single responsibility classes are already recommended for reuse like this in OO.

Also when it comes to data validation, OO performs all of this validation too and in a much more compact and code-oriented, extensible way. Why would I write a separate schema when the object itself knows what it will accept, what is optional, and what range the values should be in? I'd imagine the schema and code could become incoherent.

weavejester 3 years ago

Just imagine you're sending the data across a network, instead of between local functions. If you have a web service that spits out JSON, then you have data that is decoupled from code. That's not to say that the JSON data isn't then read and manipulated by code; just that no specific code is associated with the data.
As for why you'd want to do this, well, one reason is that it makes it easier to bounce data between different services. You don't need to perform any sort of conversion if you're operating directly on the data you're receiving and sending.
The second argument for this style is perhaps more ideological. In the Clojure community in particular, complexity is seen as arising from coupled components. The more things you can decouple, the less complex your codebase. The less complex your codebase, the more reliable and extensible it is.
Edit: another potential advantage is that its easier to use generic functions to interrogate and manipulate data that isn't encapsulated in specific types or objects.
- theteapot 3 years ago
  
  > Just imagine you're sending the data across a network, instead of between local functions. If you have a web service that spits out JSON, then you have data that is decoupled from code. That's not to say that the JSON data isn't then read and manipulated by code; just that no specific code is associated with the data.
  That's not really true in classical OO or DOP. There is always code that depends on specifics of some data. In classical OO it's extremely common to de-marshal data straight off the wire and into a class (AKA hydration). From then on the thing that interacts with the data structure directly is the object instance.
skippyboxedhero 3 years ago

Because the problem in OO is if you have any kind of cross-cutting concern, then it collapses totally.
For example, I have a Wizard object. My wizard has a wand, we store the wand object on our Wizard object. Simple. But then my wizard casts a spell, and does damage to a goblin. Do we put the cast method on the wand, the wizard? There is no real reason to pick one over the other (this problem comes up a lot with game development, which is why this pattern is more common there...Spring is another example, aspect-oriented programming/dependency injection works from similar principles). It is far easier to separate that out totally and have a pure, reusable cast function that takes the wizard, goblin, and weapon.
Another aspect of this problem (which Rust, as an example, makes clear) is that you introduce runtime bugs or hurt performance when you start carrying around a lot of references everywhere. Once you start to think about what actually needs a reference to another object (in Rust, this is limited by the borrow checker) then you realise why OOP doesn't work in some cases.
OO doesn't perform validation, your code performs validation on the data. You can write a separate schema, you can write one schema, but the problem is that OOP tries to fit a round peg in a square hole with some applications.
Very generally, it is harder to make mistakes if you use something like data-oriented. If you have a lot of code with calculations or interactions, it is very pure, easy to test, and fits well with how people think about those elements (one area I have found is financial applications, I actually worked this out and then found out data-oriented program existed when building financial-related stuff). In these cases, introducing OOP means state changing in unpredictable ways (and then someone comes into the project, doesn't understand the abstraction, calls a method that is named erroneously and it all goes wrong).
- flarg 3 years ago
  
  You have a spell object
- treis 3 years ago
  
  >But then my wizard casts a spell, and does damage to a goblin. Do we put the cast method on the wand, the wizard? There is no real reason to pick one over the other
  You did pick: "My wizard casts a spell". The cast method goes on the wizard.
  
  411111111111111 3 years ago
  
  Unless cast is an ability unlocked by the wand, then you'd only have "use main/offhand" on the person, and the cast would be on the wand.
  It's really not as cut and dry as you said.
  
  yxhuvud 3 years ago
  
  Another way to phrase it is to note that the question is answered by the question of what happens when the wand is dropped and the goblin picks up the wand. Can it cast the spell? Then it belongs to the wand. If not then it belongs to the wizard.
  In this case it just mean you have to match your game semantics to the decision of where to put the ability. That doesn't mean it would be a good idea to perform fighting by simple method calls on the objects though. It is very likely one would want to extract the abilities granted by skills and object to a separate entity that change seldom enough.
  That doesn't mean I see what problems a schema solves here. That mostly seems like unecessary indirection..
  Also note I don't try to refute the problem in general, just this specific example. Maths have lots of examples that doesn't have the semantic baggage that makes the question solvable.
  
  411111111111111 3 years ago
  
  You haven't convinced me with that at all, honestly.
  You can also have the spell on the wand and require n mana to be put into it for activation (argument), now warrior type goblins won't be able to use it either.
  There is no "correct way" with OOP: There are as many ways as you've got the energy to think about and all of them are leaky somewhere. You can still write maintainable code with it, but making it sound clear and obvious where what should go is just wrong. It heavily depends on a multitude of factors and can get nauseatingly complex at times
  
  rawoke083600 3 years ago
  
  This ! Each answer here as to where to put it is different or breaks down in different scenarios.
  While none of the OOP solutions are outright wrong, neither one are "obviously right".
  
  zasdffaa 3 years ago
  
  Multiple dispatch solves that question. I wish it was better known. Dispatch is on both wizard (or generally, user) and wand (or generally, item used)
  
  zwkrt 3 years ago
  
  Right but now it’s also raining and a full moon and the goblin is a werewolf and wands also have AOE spells that hit multiple opponents, except when those opponents are blocking…
  Sure, the code kinda sucks either way, but the data oriented approach works exponentially better as the object interactions become more complicated. A “cast” function called as part of the event loop can look up all the game state in the state DB as it needs to. wand.cast(…) is a lot more brittle, ESPECIALLY once one wants to start reusing some of the code in sword.swing(), etc.
  
  zasdffaa 3 years ago
  
  > wand.cast(…) is a lot more brittle
  That's not how it works. Multiple dispatch looks like a normal procedure call, it would look like
  cast(item, user)
  or if you need to know it was raining
  cast(item, user, worldstate)
  
  mattarm 3 years ago
  
  …and thus code is no longer sending _a_ message to _an_ object but using argument types to pick which function/procedure to call. The “cast” function is no longer coupled with any single class, so in effect this problem with OOP has been fixed by not using OOP concepts to solve the problem.
  Or, as I sometimes think of it, it can be an OOP design if multiple dispatch is actually a set of messages supported by an implicit, singleton, and usually hidden, “multiple dispatch“ object that handles the dispatch logic. You don’t have to write “dispatcher.cast(a, b, c)” but that is because the language provides the syntactic sugar (and often, an efficient implementation).
  
  zasdffaa 3 years ago
  
  > …and thus code is no longer sending _a_ message to _an_ object but using argument types to pick which function/procedure to call
  I dunno. From https://en.wikipedia.org/wiki/Multiple_dispatch
  "Multiple dispatch or multimethods is a feature of some programming languages in which a function or method can be dynamically dispatched based on the run-time (dynamic) type or, in the more general case, some other attribute of more than one of its arguments.[1] This is a generalization of single-dispatch polymorphism [...]"
  It's just a nicer OOP to me. *shrug* But thanks.
  
  treis 3 years ago
  
  >Right but now it’s also raining and a full moon and the goblin is a werewolf and wands also have AOE spells that hit multiple opponents, except when those opponents are blocking…
  >the data oriented approach works exponentially better as the object interactions become more complicated
  Maybe it's just me, but I still don't see the difference. To use your example, you'd have a function like:
  cast(caster, targets, weather, time, location)
  or you can create an object and call it something like Spell and do something like:
  class Spell
  def initialize(caster, targets, weather, time, location) ... end def valid_target? ... end def valid_caster? ... end
  end
  Perhaps it's just my mental conception of things. My mental model of a class is a data structure plus a bunch of functions that implicitly take the data structure as a parameter. I realize that there can be more to it than that but it works for the purposes of this discussion.
  Ultimately it's a code organization question either way. The original question is what class does it go in? Changing it to data structure + functions just changes the question to what module/file does the cast function go in? Maybe that's an easier question to answer but I guess I just don't see it.
  
  theteapot 3 years ago
  
  > My mental model of a class is a data structure plus a bunch of functions that implicitly take the data structure as a parameter.
  That's not OO. A class without instantiation is just namespacing.
  
  jhgb 3 years ago
  
  > Right but now it’s also raining and a full moon and the goblin is a werewolf and wands also have AOE spells that hit multiple opponents, except when those opponents are blocking
  What exactly does this change? In a well-written program you should be able to write code that adapts to context so that you don't have to pass absolutely everything.
  > Sure, the code kinda sucks either way, but the data oriented approach works exponentially better as the object interactions become more complicated
  Maybe what you want is actually https://mitpress.mit.edu/books/software-design-flexibility.
  
  skippyboxedhero 3 years ago
  
  Because the function of the spell is to modify the state of the wizard, the wand, and the goblin...that isn't obvious or correct. It can go on the wizard, it can go on the wand but the problem is that the solution is very brittle (because you will likely end up with inheritance as you add other objects). Again, the key point is that in OOP these cross-cutting concerns will likely end up with a design that is complex/arbitrary/unperformant.
  This is a good tutorial on this topic - https://ericlippert.com/2015/04/27/wizards-and-warriors-part...
dustingetz 3 years ago

to use Rich Hickey’s definitions, data is an observation or measurement at a point in time. [:fullname “John Doe” t1] [:first “John” t1] no code needed to see the denormalization. Which came from the symbolic model that was chosen. the code only exists to translate between incompatible data models of a program’s inputs and outputs.
snidane 3 years ago

> Why would I write a separate schema when the object itself knows what it will accept, what is optional, and what range the values should be in?
Because data is just data and the meaning to it is given at the time of application. If you want to couple validation to the data itself - how do you decide which N of the meanings to validate against?
- galaxyLogic 3 years ago
  
  No, data must have meaning, else it is meaningless.
  If you want to process the data you must use some language to access parts of the data. Data must have a symbolic representation, not juts be 1s and zeros. Or it can be a bit-stream, but even then you need a language that knows the different between 1 and 0.
  person.name
  extracts the field 'name' from the data. To manipulate the data, the program must know that there is such a field as 'name' it can ask for.
oivey 3 years ago

OO schemas are very strict and in many situations difficult to extend. For example, let's say you have a class named Name that contains firstName and lastName. Let's say that you have a function that consumes lists of Names. Let's say you have yet another class called OtherName that contains firstName and lastName. That class will not be compatible with the function. Usual OOP suggests you solve this via inheritance, but if you don't own Name or OtherName that won't help you. OOP's tools for polymorphism are very limited, especially if you don't own all the code you're trying to use (third party libraries). If the "schema" enforced by the type system didn't include the name of the object that opens up a lot of possibilities.
- jcelerier 3 years ago
  
  > OO schemas are very strict
  I mean, it's the whole point. You want to have something that will give you compile errors whenever you change anything to make sure that you go all over the cases where the change has an impact
  
  oivey 3 years ago
  
  The goal of OOP or any programming system is to help you enforce invariants to improve correctness of a program. “has String members ‘firstName’ and ‘lastName’” is a perfectly reasonable invariant that isn’t all that well served by traditional OOP. You can’t strictly enforce the invariant I just stated via OOP. You can only define tangentially related concepts via inheritance.
  
  jcelerier 3 years ago
  
  > You can’t strictly enforce the invariant I just stated via OOP.
  but you can strictly enforce it in pretty much every relevant OOP language - Java, C#, C++, C, Rust, D, ... by defining a class / struct / record with these two members, which is the only thing that matters. No one programs in abstract design principles, only in actual programming languages.
  
  oivey 3 years ago
  
  Ok, show me in C++. You have these three structs, which you cannot alter the signature of (maybe from an external library, maybe you don't want to introduce a type hierarchy in someone else's code, etc).
  struct Name { std::string firstName std::string lastName } struct AnotherName { std::string lastName std::string firstName } struct YetAnotherName { std::string middleName std::string lastName std::string firstName }
  Write a single function that returns firstName and lastName concatenated. Bonus: write it for any struct containing firstName and lastName. The only way I can think to do it is via templates, which aren't traditional OOP and have their own downsides. Concepts in C++20 look like they make this much easier, but, again, not expressed via traditional object orientation and still infects your code with templates.
  This isn't theoretical. I often don't want and often cannot use inheritance-based polymorphism. If I'm using a language where that is the only option, I'm struck writing tons of redundant, error prone, pointless, and brittle glue code. The amount of glue explodes combinatorially. That glue code can contain errors that the type checker won't find.
  The inverse of this problem is also interesting. Someone wrote a function to concatenate the strings in Name. I can't put AnotherName into it unless the original author had the forethought to make their function templated. I guess the future of C++ is that all code ever lives in headers.
  
  jcelerier 3 years ago
  
  > The goal of OOP or any programming system is to help you enforce invariants to improve correctness of a program.
  enforcing invariants means reducing the number of types that satisfy the invariant. I wouldn't call doing what you ask for "enforcing" any invariant.
  > Bonus: write it for any struct containing firstName and lastName.
  well,
  auto concatenate(auto t) { return t.firstName + " " + t.lastName; }
  satisfies your condition - here's one that'll also handle middle names: https://gcc.godbolt.org/z/YfTYs6MMn ; again I don't think anyone wants this when they say "enforcing invariants", this does exactly the opposite of what is actually wanted.
  > I guess the future of C++ is that all code ever lives in headers.
  or in modules, which makes it very similar than other languages with generics instantiation
  
  oivey 3 years ago
  
  Ahh you’re, right, I forgot about fully auto’d functions. I think want more structure than just auto everything, although the compilers should find mistakes with that. Concepts will be nice.
  
  jcelerier 3 years ago
  
  The concept version here would look like:
  template<typename T> concept WesternishName = requires (T t) { { t.firstName } -> convertible_to<string>; { t.lastName } -> convertible_to<string>; }; auto concatenate(WesternishName auto t) { return t.firstName + " " + t.lastName; }
- crabmusket 3 years ago
  
  This is the unfortunate side effect of the Javafication of OOP. Smalltalk doesn't have that problem. Neither does TypeScript, but it has taken us this long to start undoing the Javafication process.
ajuc 3 years ago

I think in Data Oriented Programming it's fine for your code to depend on data structure. You want to separate code and data, not to decouple them.
As for why see for example Command Query Separation (the data oriented way) vs Tell Don't Ask (the encapsulate everything way).
drpyser22 3 years ago

For validation, this approach would have you write a set of functions to validate the properties of the data.
Nothing forbids a function that applies validation to inputs before returning a data object? Extensibility can be done through functional means(e.g. higher order functions, function composition, lens) or oop(strategy pattern and equivalents, code object composition and inheritance,...). Not sure what you mean by more compact and code-oriented?
Code is always coupled to an interface, implicitly or explicitly. In the case of oop, code is coupled to the class, which can represent something specific with very concrete semantics (e.g. employee, author) or something generic that is meant to be subclassed(e.g. person).
randcraw 3 years ago

These examples spring to mind: 1) high performance computing (vector processing / SIMD), 2) deep neural nets, 3) graphics. Each of these computation models process a small number of large blocks of data whose efficient movement is just as important as their efficient number crunching. OOP doesn't serve those emphases as well as DOP does.
alphanumeric0 3 years ago

Thank you for all of the different viewpoints, it's starting to make sense now. I've used JSON schema before in one of my previous projects. I'll keep it in mind for next time.
roenxi 3 years ago

Code is itself data, so a full decoupling is logically impossible.
Data is going to have an implicit schema regardless because that is just how data works. And once there is a schema, it may as well be expressed explicitly independently of the code because then you get the whole basket of standard schema operations for free (validate the data against a schema, provide a schema to an external consumer when moving data around, talking/operating more generally on schema to manipulate data, generating glue code or APIs programmatically).
Your description sounds like you are using your objects as schema references which is fine, but if there is a 1:1 correspondence with schema then you are already doing data oriented programming, and if there isn't then you can't have 3rd party libraries that support schema-based operations. And losing those schema-based operations hasn't gained anything because the data still has a schema, it just isn't well organised.
TLDR; Data oriented programming isn't essential. But if you plan on passing data around between systems schema should be mandatory, and if you pass data around within a system schema are recommended.
> Why would I write a separate schema when the object itself knows what it will accept, what is optional, and what range the values should be in?
In practice, I have seen a fair number of complex objects where that information is obscure. If there isn't an explicit schema there is a chance of bugs where the object doesn't understand the data it is ingesting and that it won't share its knowledge that there is a problem until it fails in some obscure way in runtime. It wastes a lot of time fixing those bugs because the easiest way to clean that up is to tease out an explicit schema & start thoroughly validating inputs.
- deltaonefour 3 years ago
  
  > Code is itself data, so a full decoupling is logically impossible.
  Nah, code exists in a different universe then "data" and it's decoupled by default. Runtime data is completely unaware of "code." that is unless you do something called "reflection" which is sort of a rarely used feature.
deltaonefour 3 years ago

If you have a hard time thinking this way then you really need to try other paradigms in coding because OOP is not the only way and it is getting less and less popular.
For example C, is not OOP. Linus Torvalds hates OOP, so linux is in written in C. Go was created by Robert Pike who's also subtly against it, and it shows in the language. Additionally Rust pretty much gets rid of objects as well. React is also moving away from class based representation of components.
These are just modern languages that are moving away from OOP. In addition to this... behind the modern languages there's a whole universe and history of other styles of programming.
Not against OOP... But I'm saying it's not a good sign if OOP is the only perspective you're capable of seeing.
- discreteevent 3 years ago
  
  I think all of your examples allow encapsulation of data behind polymorphic interfaces. So, not data oriented. This includes the Linux kernel [1]
  https://www.cs.cmu.edu/%7Ealdrich/papers/objects-essay.pdf
  
  deltaonefour 3 years ago
  
  Polymorphism and encapsulation are not features exclusive to OOP.
  Isomorphisms exist everywhere. You could say nothing is OOP because it compiles into assembly anyway where pretty much all those concepts don't even exist. You can even say all of assembly can decompile into OOP so everything is OOP.
  Think about it. If linus torvolds hates OOP. What is it that he hates? You can say, Linus is an idiot and everything is OOP so he's wrong.
  Or you can understand what is it about OOP that he hates and understand the delta between my examples and what is traditionally thought of as OOP.
  None* of my examples contain a class. That should be a hint, as that syntax is pretty much used in most OOP languages.
  *React still contains class based components except the creators now recommend moving away from class based syntax to functional components.

g9yuayon 3 years ago

Reading the discussion here, I can't help but thinking that people are defending their own philosophies: OOP vs FP vs DOP vs etc. I wish the author had killer applications or killer examples in different categories, like can I code an operating system easier, can I code a database easier, can I create a complex streaming job easier, can I write a library as complex as Apache BEAM easier, can I write a compiler easier, can I create a web framework easier, can I write a JSON parser easier, you get the idea. Or maybe examples that contrast existing solutions: how do I use DOP to write a better RxJava? how do I use DOP to write a better SqlLite? How do I use DOP to write a better graph library? How do I use DOP to write a better tensor library? how to do use DOP to write a better Time/Date library? You know, something that's so compelling and so obvious.

ozim 3 years ago

I have to agree here - there is total disconnect from context in these discussions.
I am writing business line applications - I don't have much need for "generic" functions like outlined in the article. My framework/language provides for example generic .Sum() I could use if I implement specific interface.
But usually I have to make specific sum and put it in database or in the interface.
Like I need to sum age or sum prices or sum amount of items in inventory - and I have to show these in the interface. I think it is quite BS to say there can be "generic" data structure and "generic" functions in context of business line application.
Other stuff I was doing was warehouse automation system and if I had X,Y,Z coordinates I had these in generic data structure named Coordinates - but any function that was going to do anything with coordinates had to be implemented in the context of machine. For example lift should never operate on X cooridinate I could calculate distances - but then there was never use case to calculate distance between machines because these had static access points and one would calculate distances to these access points only.
throwaway894345 3 years ago

You'll have to define "OOP" first. Everyone thinks its defined, but even among OOP proponents there isn't consensus ("it's about message passing", "it's about encapsulation", "it's about inheritance", "it's about dot-method syntax", etc).
- g9yuayon 3 years ago
  
  My point is that the author needs to make it clear what exactly DOP can do better for. OOP, whatever how that is defined, is just an example of what people discussed under the OP.
- deltaonefour 3 years ago
  
  One way to define it is to look at the thing that's unique to OOP that isn't used in any other paradigm.
  In OOP data and method are unionized into primitives that form the basic building blocks of your program.
  This is unique to OOP. It must be the definition. Defining it in terms of message passing, encapsulation and inheritance are not as good because these concepts are used in other paradigms as well.
  
  danielscrubs 3 years ago
  
  Wouldn’t closures and let’s say “type-driven” also support your definition? It’s quite tricky as sometuple.fun() and fun(sometuple) might as well be interchangeable. If you really want to support a formal definition it would probably be best to have it in denotational semantics… not an easy task.
  
  jcelerier 3 years ago
  
  Closures aren't reusable across a whole program unlike classes since their type isn't named, and they don't allow to have more than one operation on the data held by the closure. They really are "poor man's objects" ;p
  
  uryga 3 years ago
  
  they allow as many operations as you need, just pass in a "method name" :)
  const p = Point(3, 5) p('getX') // 3 p('up', 11)('toString') // "Point(3, 16)" const Point = (x, y) => (method, ...args) => { switch (method) { case 'getX': return x; case 'getY': return y; case 'toString': return `Point(${x}, ${y})` case 'up': return Point(x, y+args[0]); // ... } }
  (unfortunately statically typing this statically requires... some work, either sth like [typescript overloads + literal types] or full on dependent types)
  
  deltaonefour 3 years ago
  
  Most things in programming aren't formally defined. It is especially hard given the isomorphisms everywhere.
  At best what I've done here is eliminate some of the vagueness surrounding the definition, which is really all that's needed in most cases.
viebel 3 years ago

DOP is a good fit for building information systems

osigurdson 3 years ago

>> Take, for example, AuthorData, a class that represents an author entity made of three fields: firstName, lastName, and books. Suppose that you want to add a field called fullName with the full name of the author. If we fail to adhere to Principle #2, a new class AuthorDataWithFullName must be defined

Wait, what? Just add another field/property to the existing class. It's a silly example anyway as normally you would just add a function to concatenate the two strings.

The stated advantage is to be able to add the new property "on the fly". I suppose this means without changing the code. It does beg the question "what can existing code possibly do with this" (other than display it in a generic way or count the number of fields)? Furthermore, adding something new is rarely much of a problem as it is a non-breaking change. A more difficult example would be removing the "firstName" field. Assessing the impact of such a change in a large code base would be extremely difficult. Get good a grep and hope that the test suite is comprehensive.

brunooliv 3 years ago

Truly misguided article as most of the things by the author, unfortunately. I was so excited about the book "Data-oriented programming" when it was first being released...it was so heavily publicized as well that it was constantly in my face which likely pushed me over the edge to give it a shot and buy it.

Unfortunately, not all that glitters is gold. It feels extremely beginner oriented, only touched basic concepts taught at uni level and it shows a huge disconnect between the theory and the real world work of a developer leveraging data in any way, shape or form.

Don't buy the book, it's so not worth it.

blain_the_train 3 years ago

how did it show a disconnect?
- brunooliv 3 years ago
  
  Essentially, by pushing the discussed topics as the One True Way and almost "purposedly" choosing to not discuss trade-offs and talking about alternatives or disadvantages. In the real world, trade-offs are very important

bo0O0od 3 years ago

Awful stuff. These principles could only ever make sense in a dynamic language since it's mostly manually enforcing some of the basic functionality of a type system, but the fact that he also tries to argue this style could be used in a language like C# throws that defense out the window.

https://blog.klipse.tech/databook/2022/06/22/generic-data-st...

The examples also contradict his other principles, i.e. immutability.

weavejester 3 years ago

It's not impossible to type check heterogeneous maps at compile time, but most static type systems don't support this. I think you'd certainly see much more friction trying to program like this in C# than you would in Clojure.
- jayd16 3 years ago
  
  C# has type safe anonymous types.
  //Anonymous type with integer property var foo = new {Number = 1}; //Compile time error if you try to assign a the integer property to a string string s = foo.Number;
  An object is just a fancy map, after all... These are also immutable by default, which probably makes them even more relevant to this DOA discussion.
  
  weavejester 3 years ago
  
  Maps are open, while objects are closed.
- zasdffaa 3 years ago
  
  > It's not impossible to type check heterogeneous maps at compile time, but most static type systems don't support this
  I guess you mean dependent types[1], but if you don't, I'd appreciate an elaboration. If you do mean DTs, how might it look for a hetero collection?
  [1] If anybody has any good intros to dependent typing in C#, that'd be much appreciated. A web search throws up some pretty intimidating stuff.
  
  siknad 3 years ago
  
  Dependent types are types that depend on values, possibly runtime values. In C# types can only depend on other types when using generics: List<T> depends on T. In C++ there is std::array<T, n> (array with length encoded in type), where n must be known in compile time.
  With full dependent types one can write generic types like std::array and use them with runtime parameters. In dependently-typed languages there are two main types: Sigma (dependent pair) and Pi (dependent function). Example of Sigma type in pseudo C#:
  (uint n, Array<int, n> a); // array of any size
  Pi:
  Array<T, (n + 1)> push(Type T, uint n, Array<T, n> a, T x) { ... } Array<int, 3> x = push(int, 2, [1, 2], 3); // [1, 2, 3]
  Generic function f<T> is similar to a dependently typed one with `Type T` argument (requires first class types and many DT-langs have them). Values, on which types may depend, shouldn't be mutable, while C# function arguments are mutable.
  A bit larger example:
  void Console.WriteLine(string format, params object[] args);
  Using dependent types you can transform `format` into a heterogeneous array type containing only arguments specified in the format string.
  WriteLine("{%int} {%bool}", ?); // ?'s type is an array with an int value and a boolean value.
  A heterogeneous map may be implemented as a map T with keys mapped to types and another map where keys are mapped to the values of a corresponding type in T. Probably this is not a good representation, but it is a valid one.
  
  zasdffaa 3 years ago
  
  Ouch! But thanks! I'll have a good chew over this evening.
  I know it's possible to have DT in C#, I think your literal of '3' in the above example is constructed using the successor function (well, its moral equivalent) at compile time, I just can't find the article I read and there's very little out there for DT in C# at all. And it's over my head until I sit down with a good example and work it through.
  I could use some simple DT for list lengths in my project.
  
  weavejester 3 years ago
  
  siknad and uryga give good replies on dependent types and TypeScript's structural typing. There's also an experimental static typing system for Clojure that allows for typing of maps:
  (defalias NamedMap (HMap :mandatory {:first-name Str, :last-name Str})) (ann full-name [NamedMap -> Str]) (defn full-name [{:keys [first-name last-name]}] (str first-name " " last-name))
  If you can determine some subset of the keys of a map at compile type, then you can type check it to some degree.
  
  uryga 3 years ago
  
  while you probably can do this with dependent types, i'd imagine GP means something along the lines of typescript's structural typing, i.e.
  const post = { id: 123, content: "...", published: true, } // TS infers the type of `post` to be an unnamed "map-ish" type: // { id: number, content: string, published: boolean }
  JS objects are map-like, and this one is "heterogenous" in that the values are of different types (unlike most maps in statically typed langs, which need to be uniform). this just "structural typing", the easier way to do stuff like this.
  now, dependent types allow you to express pretty much arbitrary shapes of data, so you can do heterogenous collections as well. i haven't read about it enough to do a map, but a list of tuples (equivalent if you squint) is "easy" enough:
  [ ("id", 123), ("content", "..."), ("published", True), ]
  in Idris, you could type it as something like this:
  -- a function describing what type the values of each key are postKeyValue : String -> Type postKeyValue k = case k of "id" -> Int "content" -> String "published" -> Bool _ -> Void -- i.e. "no other keys allowed" -- now we're gonna use postKeyValue *at the type level*. type Post = List (k : String ** postKeyValue k) -- "a Post is a list of pairs `(key, val)` where the type of each `val` is given by applying `postKeyValue` to `key`. -- (read `**` like a weird comma, indicating that this is a "dependent pair")
  more on dependent pairs: https://docs.idris-lang.org/en/latest/tutorial/typesfuns.htm...
  in general if you want to learn more about DT's, i'd probably recommend looking at a language like Idris with "native support" for them. backporting DT's onto an existing typesystem usually makes them much harder to read/understand that they actually are (and don't get me wrong, they're mindbending enough on their own).
  if you don't want to bother with that, i'd look at Typescript - it's combination of "literal types", "type guards" and "function overloads" can get you some of the power of DT's. see this article for some examples: https://www.javiercasas.com/articles/typescript-dependent-ty...
  
  zasdffaa 3 years ago
  
  Thanks! I guess what you are describing looks - from my very limited experience with scala - as a path-dependent type. If I'm right.
  I'm actually talking about C# because I'm working in it and I'd like to make some compile-time guarantees if possible. Or at least know how to assure a method that they are getting a list with at least 2 values in it, for example. It may not be worth the effort but it would be nice to know how.
  I've got books on DT, idris, and another DT lang, trouble is there's no call for any of this stuff in industry so they get repeatedly pushed to the bottom of the priority stack. Sickening, innit.
  
  uryga 3 years ago
  
  i haven't used scala, but from the looks of it, yeah, "path-dependent types" are a narrow subset of full dependent types, intended for stuff like this exact use case :D
  there's things you can do to track list length at the type level, but it usually involves putting your data in a special-purpose linked-list thingy: https://docs.idris-lang.org/en/latest/tutorial/typesfuns.htm...
  (the `S` and `Z` refer to peano-style natural numbers)
  although if you go that way, you can actually get a lot of that without dependent types! here's an example i found of someone doing a similar construction in C#: http://www.javawenti.com/?post=631496
  last but not least, in TS you can use the builtin support for arrays as tuples and just do:
  type AtLeastTwo<T> = [_1: T, _2: T, ...rest: T[]]
  which looks very nice, but it's pretty much only doable because the type system has specific support for array stuff like this, so not a really general solution.
  
  zasdffaa 3 years ago
  
  Thanks, yeah, did a big search for DT stuff this morning and found this one. The S/Z is the zero/successor. I need to study it. Perhaps closer to what I originally saw was this https://gist.github.com/bradphelan/26c0e84197092620359a (edit: it's not closer. Still worth a peruse though)
  I need to sit and read and butt heads with these if I can find the time.
  It's so odd there are so few examples of C# DT. The one good one I found a year ago seems to have disappeared. Maybe a conspiracy?
  Looking forward to Idrisizing or Agdicating some time!
  
  uryga 3 years ago
  
  good luck! it's quite fun :)
  another term that might be useful is "GADT" - it's kinda like a weaker form of DT, more easily expressible in, uh, normal languages. the C# one i linked is really more like a GADT, bc it all stays at the type level w/o bringing "normal" values into it. another way to do it would be a sealed class with a private constructor + static methods:
  class Foo<T> { private Foo(...) static Foo<int> Bar() { ... } static Foo<string> Zap() { ... } }
  (or sth like that, i don't really do C#!)
  so that way you can have the type param and use it to track something, but limit what can be put there - in this case, only int or string. type-safe syntax trees are a common case for sth like this (though in that case you'd probably go for an abstract base + subclasses, like in the link).
- bo0O0od 3 years ago
  
  I agree, but I also think if they author either knew what they were talking about or wasn't just trying to sell more books they'd make that clear. Rather than just trying make the case for these patterns in languages that don't suit them.
jayd16 3 years ago

Kinda funny to say seeing as C# actually has a dynamic type.
Even if you don't use that, you could certainly orient your data as "structs of arrays instead" of "arrays of structs" (so to speak). It's fairly common in games.
agrafix 3 years ago

you can type check this in static languages too if the type system supports structural typing [0]
[0] https://en.wikipedia.org/wiki/Structural_type_system

mtVessel 3 years ago

There's something called "Data-Oriented Programming" and something else called "Data-Oriented Design". I can never remember which is which. This post changes nothing.

GolDDranks 3 years ago

I think the abbreviated forms of the paradigm are often more "stable" than the full names, because people keep hand-waving the names and thus mixing them up. For me, "DOD" is the thing where you are very performance-oriented and you have flat, cache friendly arrays of data and affinity to ECS (entity-component-system) stuff etc. This is clearly not it, but eyeballing it, the ideas seem somewhat compatible with it. (Except the immutability part.)
ArrayBoundCheck 3 years ago

One is ridiculous and uses immutable data. The other is made famous by a guy in a Hawaiin shirt and preforms well. Designer on vacation might help you to remember which is which
- Tomis02 3 years ago
  
  Indeed. The guy in Hawaiian shirt writes simple, fast code with the minimum amount of abstractions, and therefore can afford to finish work early and go to the beach. Everyone else is working overtimes untangling a mess of objects, principles and hierarchies.
  
  mistrial9 3 years ago
  
  Larry Wall is that you!?
  
  peoplefromibiza 3 years ago
  
  Most likely he's Mike Acton.

osigurdson 3 years ago

>> static boolean isProlific (Map<String, Object> data) { >> return (int)data.get("books") > 100; >> }

Could anything be more confusing with a large code base? Also, lots of nice key not found and invalid cast exception errors to debug with this approach. Sometimes boxing makes a material difference to performance as well.

readthenotes1 3 years ago

I haven't loved debugging systems whose primary data structure is a Map.

qsort 3 years ago
I think it very much depends on what problems you're trying to solve and whether or not proper data types have been defined.
If your primary data structure is
```
  Map<Integer, List<String>>
```
we have a huge problem.
On the other hand, if your primary data structure is
```
  Map<CustomerId, List<Purchase>>
```
Then I'd rather see that than IPurchaseMappingByCustomerIdAbstractFactory or whatever other abomination OO priests will conjure. Generally speaking, generic structures are simpler and they allow for easier transformations.
- marcosdumay 3 years ago
  
  The article doesn't explain, but links into a deeper one. The author really means you should use Map<Integer, List<String>>.
  He also seems to be unaware that you can have generic code with specific types.
  
  weavejester 3 years ago
  
  You can still use types, as long as they don't encapsulate the data. Admittedly this is hard to do in some languages.
skippyboxedhero 3 years ago

I am not entirely sure that most data-oriented programs do go this way. You can split functionality out of the data but the data can still be represented with an object or whatever. I would agree though, you might as well be using Python at that point.
An example of what I mean is Spring. Obviously, from what I recall, that goes to other extreme with lots of XML configuration. But there is no need for vague types that can cause all sorts of mischief at runtime. The key idea is splitting code from data, not necessarily the representation of the data (although that can come into it if you have performance-sensitive apps).

javajosh 3 years ago

Two concepts that seem entirely missing are "ownership" and "transformation". The root of ownership is "system of record" with intermediate systems combining data and doing transformation. "Getting data" becomes a question of where, in a differentiating tree that has it's root in the SoR, you connect to, and the trade-offs that implies.

This post (and the book it points to) is perhaps teaching a new generation what has been known for a long time: the "body" of your business is the data, not the code. E.g. if you have limited space on a thumbdrive and can only keep one thing in a datacenter fire, your database or your codebase, you keep the database.

frogulis 3 years ago

Rich Hickey's talks "Simple Made Easy" [1] and "Effective Programs" [2] provide a better explanation of these ideas IMO. The specific definition of "simple" is pretty crucial.

[1] https://youtu.be/LKtk3HCgTa8 [2] https://youtu.be/2V1FtfBDsLU

xixixao 3 years ago

> Adherence to this principle in OOP means aggregating the code as methods of a static class.

This is not OOP, this is a way to do functional programming in a class-based language that lacks top-level function declarations / modules.

While this might seem a nit pick it makes me sceptical about the rest of the content.

osigurdson 3 years ago

I think possibly what they meant is "when using an object oriented language like Java or C#", aggregate the code as methods of a static class.
- hinkley 3 years ago
  
  That's not necessary either. What we're talking about here is an analog for the real problem, which is "when using an object oriented language, write pure functions even though the language doesn't make you do it".
  Static functions introduce friction against making impure functions, but not an overwhelming amount of it. If it's all you have, then there are worse safety blankets, but it's not going to solve your lack of buy-in problem. If everyone is on board then you don't need static functions. If they aren't, static functions aren't going to save you, they're just going to extend the amount of time you suffer before you wise up and get a new job somewhere else.
  
  osigurdson 3 years ago
  
  If all of the data structures are maps of maps of maps, it seems that it would be obvious to use static methods in this situation. I don't understand the use case for instance methods with this arrangement.
  
  hinkley 3 years ago
  
  Lazy evaluation for one. Organization for another.

revskill 3 years ago

No code example ? No, it's a sign of bad book itself.

One code example is worth 1000 images, and 1 image is worth 1000 words.

Always use code block to illustrate your point, as it help reader understand better your point.

Writing a book is more about getting reader into your thought rather than make them think.

pca006132 3 years ago

I thought this is talking about data-oriented design, which focuses on the data layout to make programs more efficient, e.g. structure of array that can be more cache friendly in some cases.

> Principle #2: Representing data with generic data structures.

OK probably this is not what I expected.

andreareina 3 years ago

> Breaking this principle in FP means hiding state in the lexical scope of a function.

If that's happening a lot that's not really FP anymore, is it?

throwaway17_17 3 years ago

Nothing about keeping values in functions is ‘non-functional’. That’s like saying that hard coding the quadratic formula inside some function instead of using a lambda as an input is ‘non-functional’. His language in that statement is poorly chosen. I am almost certain he is not implying any action at a distance ‘state’, he is trying to talk about including context for some data inside functions that operate on the data. It would be like hardcoding an ISBN-to-title list inside a function that takes a list of authors and there books as input for processing. I think he’s saying the ISBN-to-title should be a part of the data structure, and storing it inside the functions breaks these rules he has invinted.

jokoon 3 years ago

I love DOP, and I loathe OOP.

But when you use a framework that enforces OOP, it's quickly difficult to use DOP.

crummy 3 years ago

What languages/frameworks work well with DOP? Functional languages I guess?
- jokoon 3 years ago
  
  Languages:
  Python, C, C++. Not java, for example.

eurasiantiger 3 years ago

Immutability is nice but now your data migration needs quadratic memory

forgotusername6 3 years ago

Redux seems to have points 1-3. There is typically no schema though.

ncmncm 3 years ago

"Anything"-oriented programming is dumb. Every big problem is a collection of smaller, different problems. Different problems call for different approaches. Sometimes the best approach has data-oriented features, sometimes object, sometimes functional, sometimes piped. For big problems you want a language good at all of them.

It is why C++ continues to grow. Some complain about that, but every single feature got there over fierce opposition by making some common programming problem more tractable.

hinkley 3 years ago

I wonder if Alan Kay would agree or argue with this sentiment, but it seems to me that the choice of 'oriented' was intentional and that we've consistently fucked it up ever since then.
Orientation should have been 'a preference for' not 'a dogmatic adherence to'. A hot-dog based diet still contains bread, ketchup, mustard and pickles, possibly some sort of cheese. A hot dog diet is just hot dogs, which is much, much less interesting, and there is no question that it is unhealthy, whereas the former might have some plausible deniability (especially if you add beans).
- ncmncm 3 years ago
  
  Alan Kay would doubtless disagree. But he would be as wrong as anybody else.
  Preference is silly. The problem dictates its solution. The good programmer listens to the problem.
drpyser22 3 years ago

Its a good thing that these paradigm exist so you know their value, and eventually understand when they're appropriate. You're right that there may not be a one-size-fits-all.
- ncmncm 3 years ago
  
  So long as you don't confuse a "paradigm" with any objective reality. Purity is for monks.
irrational 3 years ago

I think this is also a reason people try to use JavaScript for so many things. As a multi-paradigm language you can do OOP, functional, etc. in it.
- sidlls 3 years ago
  
  People try to use JavaScript for so many things because it's (surface-level) inexpensive, not because it's good at anything.
  
  eurasiantiger 3 years ago
  
  Mind-reading and zealotry, you must be a priest. What is your denomination?
  
  galaxyLogic 3 years ago
  
  It is good enough
  
  sidlls 3 years ago
  
  Yes, like PHP. In so many ways.
viktorcode 3 years ago

I wouldn't call C++ "good" at any of those problems. "Good enough" - maybe.
- ncmncm 3 years ago
  
  No language is good compared to those that will come after we are all of us dead. But C++ is the best we have. If any other language were to catch up and pass it, we would have another choice. But C++ is not sitting still.

banq 3 years ago

it actually is Domain Driven Design, domain data-oriented programming