Virtual cells

115 points by surprisetalk 4 days ago

While it says this:

> By 2021, these engineered bacteria could be simulated in unprecedented detail. Every gene, every major protein, and nearly every metabolic reaction in JCVI-syn3A.

I think the crux is here:

> Even after years of study, 91 of JCVI-syn3A's genes remain unannotated, of which roughly one-third are essential. Deleting any single one kills the cell, yet we have no idea what they do – representing some of biology's most fundamental unsolved puzzles.

---

I think minimal cells and virtual cells are especially exciting as they open up a path to create fully controlled experimental environments for biochemistry from the ground up.

Right now sooo much time in biochemistry goes into working around the limitations of what already happens to be present in an organism. E.g. we may know 5% of mechanisms that go on in a cell, but the remaining 95% percent of mechanisms that go on may still brick your experiment, and without knowing about them you essentially have to shrug and trial and error your way through them.

In contrast in a synthetic minimal cell, we could start out with an organism where we know 95% of the mechanisms that are going on, and then study new mechanisms one gene at a time, steadily building up to bigger and bigger mechanisms.

Strangely it seems to me that a lot of effort is going more into being able to simulate full cells that contain unknown mechanisms, rather than trying to use the capabilities to create hypothesis to uncover the unknown mechanisms. Yes, that probably expedites the path towards simulating much bigger human cells, but ultimately still leaves us in the dark on most fronts.

TeMPOraL a day ago

> Strangely it seems to me that a lot of effort is going more into being able to simulate full cells that contain unknown mechanisms, rather than trying to use the capabilities to create hypothesis to uncover the unknown mechanisms. Yes, that probably expedites the path towards simulating much bigger human cells, but ultimately still leaves us in the dark on most fronts.
I imagine it's much easier to create and test hypotheses about the unknown mechanisms, when you can view them in context of a larger system, with reasonable performance, allowing you to metaphorically "grab them in your palm" and tweak on the fly. We work better when we explore things, instead of immediately taking on problems that are at the limit of our computational tools, requiring individual brains (and tons of paperwork) to make up for the difference.
In this sense, researching the nano-scale basics, and aiming to simulate micro-scale cellular systems, are actually aligned - as long as they're not cutting too much corners, the latter is creating space for former work to be done efficiently.
suddenlybananas a day ago

>Strangely it seems to me that a lot of effort is going more into being able to simulate full cells that contain unknown mechanisms, rather than trying to use the capabilities to create hypothesis to uncover the unknown mechanisms. Yes, that probably expedites the path towards simulating much bigger human cells, but ultimately still leaves us in the dark on most fronts.
Seems the result of this general trend in science towards brute prediction and abandoning the goal of explanation or understanding.
- filoeleven a day ago
  
  Check out Michael Levin’s lab for a refreshing and amazing example of a group that’s bucking the trend.
  They are doing tons of experiments by starting with the premise that cells and their networks have intelligence, then using tools from behavioral science to convince them to do what the experimenters want (e.g. “grow an eye here”). I’ve been convinced by Levin’s talks that this is a more promising area of research than genetics.
  https://drmichaellevin.org/

nextos a day ago

UConn coordinated a ton of work during the past two decades on mechanistic cell models. Mostly ODEs, PDEs, and stochastic ODEs. See The Virtual Cell at https://vcell.org.

It's interesting how high-throughput perturbation assays have led to data-driven whole cell models. But these are not yet good at making robust predictions.

Probably the future are hybrid neuro-symbolic models.

donovanr a day ago

Yes, this. A lot of work in this field is missing from that timeline. Just circa 2010-2020, Les Loew's VCell 3D PDE approaches, Faeder et al.'s BioNetGen / ODE work, Luthey-Schulten Shulten's grid based cell models, the Pittsburgh supercomputing center's 3D monte-carlo MCell, the image-based deep learning models at the Allen Institute for Cell Science...
It's nice to see the idea of virtual cells make a comeback now, though the meaning seems to have shifted to transciptomics-based transformer / gpu-powered models (which have issues[0]), it's a fun field / problem, but I think it will make better progress if we take advantage of all the varied computational work that has come before.
[0] Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them all https://arxiv.org/abs/2410.13956
- udara 21 hours ago
  
  oh that's interesting, I didn't come across these the way I was looking at it! thank you for informing us, I will read up on these and add it to the timeline!
paulfharrison a day ago

What a strange web page. Scrolling is thoroughly broken.
I recently went to a two day workshop on whole cell modelling. I'm still trying to work out how much of the exercise is fantasy. I get that some of the chemistry is well enough understood to simulate from the ground up, but there's so much more to it.
The oddest thing to me is the level of satisfaction in being able to run the model. I would think the model has to be very very fast, because of all the work that needs to be done with it to fit it to data and fully understand its behavior.
- jdonaldson 17 hours ago
  
  Lol! Overriding basic scroll functionality on esoteric cell simulation software documentation pages is such a pointless gamble. Really calls the quality of the software itself into question.

maltee a day ago

Great article! Also, really nice site design, the referenced papers and annotations are a really nice touch!

moralestapia a day ago

Great post.

This is exactly what I'm an expert at, I even coined a term in the field [1], :).

Since I started doing this 15 years ago (and I know the field predates me by much), one always has had this feeling that we are so close to a big breakthrough in biological simulation, but at the same time, progress has been kind of "slow". I think the reason for that is because pushing the envelope forward in this field requires mastering three (maybe four) different disciplines, your pick of [Bio, Chem, CS, Math, Physics]. Very few people reach this level of simultaneous understanding of all these pieces.

I'm not trying to gatekeep the field, though, much of the progress here (including many of the papers mentioned in TFA) is work coming from PhD students. Anyone could jump into this, but you really need to sit down and try to make sense of it for a while, years. PhD gives one the perfect opportunity for that.

Anyway, I hope this thing keeps going on forward, it's one of the ultimate goals of Biology and it would be extremely beneficial to the world.

1: https://www.frontiersin.org/journals/plant-science/articles/...

ulnarkressty a day ago

A noob question, since the original article doesn't go into details - what is exactly being simulated here? I was under the impression that we can't even reliably do a single protein folding due to the sheer complexity of the task. So how do we simulate the zillions that are bouncing around in a single cell? And if we don't simulate it at that level, how are we confident that it is correct?
- andoando a day ago
  
  I assume same reason we dont need to simulate quantum physics to simulate a ball moving or even the weather.
- moralestapia a day ago
  
  You're right, they're only approximations at different levels, as a 1:1 reproduction of even a single cell would be unfeasible.
  Most of them are built around one specific, measurable, phenotype that they want to reproduce, like estimate metabolite input/output over time.
  Some others attempt to model the behavior of these cells when interacting with others, like in a colony or tissue. This is quite important because most of the phenomena that enables development, healing, regeneration, etc ... are emergent processes that only make sense when you study the whole tissue. One concrete thing you can measure/simulate here is "if I drop this hormone here, where is it going to be at time X and at what concentration" [1], which is super useful to do in silico because measuring that in real tissue, without or even with markers, is much more complicated, expensive and time consuming.
  1: I wrote one of the first models that was able to do this in realistic plant tissue. Realistic here means, bounded by the chemical/physical constraints found in real plants and using a structural scaffold that resembles them as well.
RivieraKid 17 hours ago

> progress has been kind of "slow"
Isn't it simply because it's a fundamentally hard problem that may not even be solvable? Simulating a 50 amino acids long protein in water for 1 ms on a top supercomputer using molecular dynamics would take about a week.
Can the current approach lead to models that are even remotely as useful as a full molecular dynamics simulation? The current approach requires us to first discover the hard stuff, the myriads of tiny mechanisms happening in the cell.
- moralestapia 16 hours ago
  
  Hmm ... a good analogy to answer your question would be.
  CFD exists and has been fundamental to shape the world as we know it (refer to Wiki page [1] to learn more about what it is and why it is important). CFD is also not a full molecular dynamics simulation, yet is useful.
  Another example could be weather models. None of them, afaik, simulate @RivieraKid typing a comment in HN, an action which, infinitesimally, affects the weather of the planet. And yet, they're still very useful.
  You work with approximations, some of them are good enough to give you 80% of the answers you want, and that 80% is more than enough to improve our quality of life significantly.
  1: https://en.wikipedia.org/wiki/Computational_fluid_dynamics
  
  RivieraKid 13 hours ago
  
  I understand the principle that models of reality don't need to be 100% accurate to be useful.
  But... the way MD, CFD or weather models approximate reality seems fundamentally different from how current virtual cell models do it.
  In MD, CFD or weather models, all of the relevant low-level mechanisms are known, that's why when we want to study a system's reaction to some external shock, we know we will get a roughly correct result with some well-defined error.
  But in a virtual cell many low-level interactions are not modelled and these interactions can have large effect. Often they are not modelled because they simply haven't been discovered yet. With an accurate cell model we would be able to discover all of those mechanisms, that's where most of the value is.
  Here's a flawed analogy: Current virtual cell models are like LLMs. They give us permutations of what we already know but they can't discover what we haven't discovered yet.
  
  moralestapia 7 hours ago
  
  I get your point that models are limited by their own assumptions and can only get you so far.
  Hmm ... a good way to see one of their benefits is that they help you iterate quickly. Typically, research is done through a design experiment -> measure -> design experiment ... cycle; a new paradigm could be simulate -> design experiment -> measure -> simulate ... the simulation helps you design a much better experiment. You could just try to reproduce what you already know, but by studying how perturbations here and there make or break one specific phenotype, you could infer how the whole system works, and then design an experiment to test a much more narrowly defined hypothesis. An example of that is what I described on a sibling comment wrt. the behavior of plant hormone transporters.
  Simulations are also helpful to fill in many of these gaps due to lack of information or missing measurements.
  One technique that has now become fundamental in biochemistry is flux balance analysis [1]. You start from the premise of conservation of matter and, say, if you put in 2 grams of sugar, after some time X, those 2 grams of sugar or their byproducts will still be somewhere in the cell. So, if you know the inputs to a system and perhaps some of the outputs as well, and you somehow know the metabolic network involved, you can perform a simulation that approximates the rate of conversion of all byproducts, tells you which pathways are active or not, and even which ones are missing as well.
  This particular technique is also interesting because it does not have any actual biological knowledge imbued into it, it's just and abstract representation of "states" in the system and rates of change between time through time (Chat says "Linear Programming applied to a constrained flow network"). The exact same algorithm works in ecology, economy (trade balances), electricity (circuits simulation), transport, etc...
  And yet, with a very simple setup, you can reconstruct even the most complex metabolic networks, see [2]; and from here, you can use this model to answer things like "which pathway consumes the most energy", "what is the most effective way to maximize the production of X", "where the bottlenecks", "how can I turn off the production of Y, with the least possible changes", etc... without having to perform the actual experiment in wetware.
  1: https://en.wikipedia.org/wiki/Flux_balance_analysis
  2: https://bmcsystbiol.biomedcentral.com/articles/10.1186/1752-...
smj-edison 20 hours ago

This is exactly the field I want to enter! I really want to work on the tooling side for atomic simulation (I think I have a design that could complete each timestep in ~10usec that doesn't lose speed as it scales). I think it would be cool to automatically extract parameters for coarser grained models.
I'm planning to go to college for electrical engineering (ASIC design), but swap out some of my requirements to focus on particle physics. The college I got into also has an undergraduate MD lab that I got invited to.
Do you have any tips on what skills you've found most valuable as you've done simulation?
- moralestapia 19 hours ago
  
  >This is exactly the field I want to enter!
  Please do!
  >I really want to work on the tooling side for atomic simulation [...]
  Please do x2. That's how I started with this.
  With respect to simulations, become as good as you can with the methods that enable them. ODEs, PDEs and how to compute (well ... approximate) them. Spend some time making sure you understand the math properly. You don't need to spend ages here, most of the equations used in the field involve no more than three variables. The trick is in how you solve them, but solvers already exist, ofc.
  Write your own FEM solver, this is a must. It's not going to be SOTA, but you'll get a good feel for all the fundamentals. Then you can move on to using whichever you need/want because you'll understand what they do. (I used deal.ii a lot, but there are many more good ones out there).
  I would recommend you work with C/C++ instead of Python (ofc. you can do both). The reason for this is that you'll kill two birds with one stone by getting a good sense of how a computer actually executes things. Python is too abstracted away from that, and in this particular field, you really need to know what the hardware is doing and how.
  Then move into GPUs. Actually, if you can write this FEM solver in something like CUDA or JAX, you will kill three birds with one stone.
  I would then try to join a research group doing this, even if it's for free, only a couple months. My personal preferred niche is what is now called morphodynamics. Just approach any of these groups and tell them "Hey, I'm good with CUDA/JAX, I wrote this solver, I made this small simulation, I'm interested in doing biological simulations and want to learn more", 8/10 will say yes, there aren't many people out there with this skillset, it's not crowded.
  I mentioned "even if it's for free", the thing is, everyone says yes to free, lol. Your goal is to get one paper out there with your name on it. It doesn't have to be your own idea/project, just help them build whatever they're doing and make sure your name is there. Then ... you're pretty much in. You can stay there or go to a different group, but now you can say "hey I worked on this project, here's the paper, I can do X" and you take it from there. :D
  If you want to be in touch email me hn @ moralestapia . com. My profile seems empty bc. our friend @dang hid it, but just send me an email.
  
  smj-edison 15 hours ago
  
  Thank you so much for the detailed reply!
  I've worked a lot with Rust, and a decent bit of C, but pretty much no C++ or GPU programming. Do you have a sense of whether it would be better to pursue CUDA or JAX?
  That's also encouraging to know that there's openings for computational simulation—I've been a little worried that there wouldn't be any openings for computational simulation since it seems like a rather small field.
  Off the top of your head do you know of any resources for learning FEM? Happy to look for it myself but it's always nice to have pointers.
  
  moralestapia 14 hours ago
  
  If I had to choose I would choose CUDA. I recently got myself into JAX, I think it has a fair chance of being the dominant framework in 5-10 years; but also, a native version of CUDA is coming to Python so ... idk.
  deal.ii has a bunch of tutorials worth gold [1].
  I learned by following them and that's why I ended up using deal.ii for almost everything. If you know C well, moving to C++ won't be quite difficult. deal.ii uses a lot of templates, that would prob. be the most unusual thing to you, coming from C, but you'll get used to the syntax.
  Pay special care to the heat equation [2], tutorial #26. Diffusion is modeled by a trivially modified version of it; heat uses a constant diffusion coefficient, in molecular diffusion this can vary.
  Roughly speaking, about 50% of what you'll model will be derived from effects related to diffusion, another 40% will have to do with "tangible" mechanical effects (think of pressure, structure, tension), the remaining 10% has to do with parametrizing the model properly for the Biology and Chemistry involved.
  Although, 90/90 rules applies and you'll end up spending a lot of time on this "last" parametrization. A lot of these constants are unknown, you'll have to guesstimate them based on whatever plausible theory you can come up with. But this is not a bad thing, recall that we are doing simulations in silico, so, you could (and should) try all ranges of parameters and study the results. You might find something that makes a lot of sense and then you can work backwards and make a prediction on the operative range of these unknown parameters.
  I did that with my plant model, there was a hormone transporter for which the rate of transport had not been determined. So, I tried a wide range of values and measured the viability of the phenotypes over it; it very clearly showed that only a narrow window of values allowed it, on a very specific time during plant development, derived from the geometrical configuration of that specific spot at that moment. We then devised an experiment for that, measured it, and it was bullseye over what I predicted. See [3].
  That would have made for a killer paper if I had been rigorous enough to publish it, but I got my M.Sc. out of it and kind of forgot about it. Don't do that, publish everything you find!
  1: https://dealii.org/current/doxygen/deal.II/Tutorial.html
  2: https://en.wikipedia.org/wiki/Heat_equation
  3: https://moralestapia.com/img/Fig.19.png
  3 Footnote: PIN1,4,7 are hormone transporters in Arabidopsis thaliana (a model plant). A specific balance of transporter activity is required to recover phenotype. In the plot, the darker the region, the most likely it is to reproduce the phenotype. The red dot is the optimal value, but anything in the black area is also really good, which is what we measured and found to be within it.
  
  dang 15 hours ago
  
  > My profile seems empty bc. our friend @dang hid it
  I actually undid that yesterday, after replying to you here: https://news.ycombinator.com/item?id=44321299. I did so because I didn't want to give the wrong impression of having punished you after moderating you, when in fact the two issues are unrelated.
  Since you brought it up, I'll clarify here: you're abusing your profile page to publish false and misleading claims about HN. Specifically, you say that "some users are favored so they get more upvotes" and "some users get a handicap, so upvotes to their accounts do not register". Both claims are untrue.
  I understand that HN's voting algorithm is hard to figure out from the outside—it needs to be, for several reasons, such as that people are constantly trying to game it. But that doesn't make it ok to publish damaging falsehoods about HN. A user who doesn't know this place well, who happens to read that, will come away with an untrue impression which could easily discourage them from participating here.
  Had you said those things in a comment, we could provide corrective information in a reply—but there's no way to do that on someone's profile page. Besides that, the About box isn't supposed to be for venting grievances or taking revenge on HN (as people are sometimes wont to do). When people abuse the About box in such ways, I think it's reasonable to hide it.
  Now that I've made it clear what the issue is, and that it is unrelated to the other moderation reply, I'm going to hide your About box again. If you want to edit it to take out the false claims, I'd be happy to reverse that again.
  ---
  Edit: it turns out that I emailed you when we originally did this back in March, and explained all of the above:
  "This is unrelated to the moderation reply I just posted at https://news.ycombinator.com/item?id=43520108, other than that I happened to look at your profile page while writing it.
  I just noticed that you have this in the About box of your profile: "PSA: HN has a hidden algorithm that manipulates the vote count for specific lists of users. Some users are favored so they get 10x more upvotes, some other users get a handicap on them, so upvotes to their accounts do not register. So, don't take karma at face value, as it is not "honest". tl;dr, even HN is propaganda."
  That's entirely false and badly misleading of others. I'm not ok with that being published on HN in a place where there's no way to answer or correct it, so I've turned off the About field in your public profile.
  I have nothing against you and you're welcome on HN, but not to make false statements like this which poison others against the site and the community. If you want to take that out of your profile and let me know when it's done, I'll be happy to restore your About field to public view.
  Daniel (dang)
  
  moralestapia 14 hours ago
  
  Ok, I removed that bit.
  I'll check that email, also.
  
  dang 14 hours ago
  
  Ok! I've removed the penalty on your profile.
  
  moralestapia 14 hours ago
  
  Fair! :D
  Thanks @dang.
  I'm probably the most scolded user on the site, that's still alive, lol.
  
  aspenmayer 9 hours ago
  
  > I'm probably the most scolded user on the site, that's still alive, lol.
  If you want to feel better, check my comment history. I have also been scolded by mods and senior users here, and I haven’t always handled it well in the moment, to my great shame. The fact that I’m still able to post here is thanks to much gracious hand-holding, patience, and generosity of the mods and legacy users.
_factor a day ago

Thank you for your contributions. You are quite literally saving lives.
Are there any good local (op-so ideally) tools and/or libraries one can experiment with? I have access to a couple HPC clusters and would love to learn more.
- moralestapia a day ago
  
  Sure!
  Take a look at SimTK [1].
  And I would try to reproduce Karr's model [2], paper here [3]; also mentioned in the linked page.
  This is the study that made me, and many others at the time, to actually take this seriously, lol. I was a student and was doing this as a hobby project, Karr's paper made me think "wait, this is actually possible, and today". It's really good if you want to learn and get your feet wet on this.
  If you want, you can reach out to me at hn @ moralestapia . com, and I'll be happy to recommend some more stuff!
  1: https://simtk.org/
  2: https://simtk.org/projects/wholecell
  3: https://www.cell.com/cell/fulltext/S0092-8674(12)00776-3
udara 21 hours ago

Thank you! and it's awesome you can contribute to the subject!
You're so right that it feels so difficult to make sense of because of how cross-disciplinary it is. I hope more people invest and work on this stuff as well. I'm hoping to learn more over the years!