Welcome to the treadmill.
This is likely the most technical project I’ve undertaken. Get ready for a long one…
Aggrecan, a macromolecule in the cartilage of your hip joint, is part of the spring in your step. It creates the chemical environment that cushions bone. If it degrades and escapes the cartilage you end up with osteoarthritis. That's what's happened to three of my parents’ four hips so far, so I figured I would learn more for when the titanium implant bell tolls for me. While I studied my dad's limp closely and reproduced it here in the form of a greenish blue torso and legs, which I think looks pretty cool, this is in fact much more of a molecular mechanism of disease (MoD) animation, highlighting the action of an enzyme that cleaves aggrecan within the articular cartilage of the femoral head. This enzyme, aggrecanase, which is a type of matrix metalloproteinase, breaks down aggrecan as part of cartilage remodeling and repair, allowing aggrecan to part with its anchor to large hyaluronan molecules and leave the extracellular matrix. In osteoarthritis, aggrecanase works overtime while aggrecan is not replaced, leading to loss of cartilage.
the limp
the limp
I strove to make this portion as accurate, cinematic, and comprehensible as possible, which is inherently a tough balance. In a nutshell, everything in the molecular animation is way less crowded and way way slower than it would be in a real, live extracellular matrix. This helped with the cinematic quality and intelligibility of the main actors in the scenes. To see more about how crowded the molecular stage gets, along with other principles of animated molecular representations, check out this great summary by Stuart Jantzen.
The sizes and proportions of cells, fibers, and other macromolecules, however, are true to scientific reality (as close as the 50+ journal articles got me to it at least). 
Why not ePMV?
Since this project was about being as physically accurate as I could muster, without making an illegible Brownian mess, I wanted to use the best tools available to the 3d package I was using (Cinema 4D R21), which in this case would have included a plugin called ePMV (embedded python molecular viewer). So why did I use something else instead?
ePMV does lots of things by default. It creates a lot of objects and nulls as soon as you fetch or load a pdb, and immediately puts over 100 materials in the material manager. The atomic representation is done through instances: hundreds, or thousands of sphere instances, which to be fair are straightforward to work with individually. But I figured I could make something more streamlined using cloners and tags that would be more performant with multiple macromolecules in a scene at once. From ePMV I just wanted the point clouds and backbone splines, but creating those could be something scripted as well. In order to get the atomic species and position data into a python script I decided to look for an existing pdb parsing python package (very quickly after considering an attempt to write my own). Luckily a package called BioPython has been around for a couple decades for use in computational biology and bioinformatics that had everything I needed. So I installed the BioPython package into the c4d python environment. Then I wrote a series of python scripts to fetch the pdb (or mmCIF), create a point cloud, add an atom cloner, the spline-backbone for rigging (and another for adding α-carbon joints and a skin deformer to the point cloud residues per joint), B-factor (temperature) tags, color and atomic radii tags, and various point selection tags based on specific residue or atom input.
Import CIF or PDB and build.py -- using biopython in C4D
When there is no PDB
What happens when it's not as simple as trawling the protein data bank and pubmed for proteins or protein domains, and the macromolecule you're attempting to assemble includes an intrinsically unstructured, disordered region. Well you're hopelessly lost
😳... OR you can write a script with some creative ideas on how to semi-randomly construct one. So this is what I did for parts of the core protein and the inter-globular domain (IGD), where aggrecanase snaps aggrecan off of hyaluronan, its anchor to the cell surface (although in hindsight, I treated the IGD more as a disordered domain than I should have in its composition). Based on the literature and common amino acid frequencies, I output amino acid sequences with corresponding hypothetical bond angles by feeding json files into UCSF Chimera. In the end I imported novel pdb structures into C4D that were plausible, and looked good enough to capture some compelling enzymatic events. The disordered segments of core protein along with homolog pdbs for the globular domains were arranged along a spline-wrap deformer setup with a dynamic spline. 

Fast forward to mid-2021, and now we have AlphaFold predicting and publishing full structures of the entire human proteome. In the case of the aggrecan core protein, their prediction of a loose, unstructured IGD looks similar to mine. But without any of the hundreds of glycosaminoglycans populating the core, the whole thing crumbles into a semi-globular pile.
On a more confessional note, the aggrecanase enzyme I chose as the molecule of interest, is actually only the catalytic domain of the enzyme, from the crystallographic pdb structure. Had AlphaFold been available (or had I attempted to construct the full aggrecanase with all the disparate domains on my own), the enzyme portrayed would have been over 4 times as large. And crucial domains predicted to be involved in its binding to the IGD would have been included. So again, let's just say I sacrificed accuracy for clarity… but having the chance to do it again, I would include the whole protein, with it's predicted loops and all.
creating bond angles for IGD sequence --> json --> import to build pdb in UCSF Chimera
Hair Hair Hair
While I first attempted to use Mosplines cloned to point selections on the aggrecan core for the keratin-sulfate and chondroitin-sulfate glycosaminoglycan (GAG) side chains, the whip-like action was just a little too creepy for my taste, and it seemed untamable no matter how much friction was added. After that point, all the GAGs, including the hyaluronan chains were constructed with GAG disaccharide-units. These were point clouds cloned along pinned hair splines. The dynamic animation came from turbulent forces, random sugar spinning, and small random motions of the cloned atoms. That was for the close-ups.
For the background molecules, the long flowing ones, i.e. hyaluronan and aggrecan, I'm also using hair, But only hair. Rather than suffer the tremendous slow down from cloning atoms onto dozens of dynamic point clouds, all of the geometry was created in the hair object, with thickness curves for the profile of the aggrecan globular domains, and noise-driven displacement in the material to mimic thermal motion.
mospline aggrecan whip
aggrecan low-poly hair
low-poly aggrecan geometry profile
low-poly aggrecan geometry profile
unsuccessful enzyme collision counter
unsuccessful enzyme collision counter
hydrolysis orchestration Xpresso graph
hydrolysis orchestration Xpresso graph

render every other frame, so plays faster (more accurate?)

more fun, less awe

Rigging the hydrolysis of Aggrecan
Leaving an uncoiled strand for the IGD made orchestrating the enzyme attack and hydrolysis of aggrecan a little easier. Nearly all of the animation is accomplished through dynamics and pose morphs. These animations are triggered and controlled by procedural timing set up in Xpresso, Cinema 4D's visual scripting system. The Xpresso nodes controlling the timing and interactions are fairly complex and intertwined.
Enzymes Attract/Repel to IGD
It took a while to choose a noise type for the 'swarming' aggrecanase enzymes that would attack the IGD (photo of noise export directory?). Because most molecules travel through their environment in a random walk or Brownian motion, each enzyme would bounce back and forth against the IGD several times either before finding the cleavage site or being jostled away. This again is where visual clarity won over accuracy. The noisier attacks were just too jarring for this piece. In the interest of time I allowed the unsuccessful enzymes to collide with twelve atoms on the substrate in the IGD (tallied in a python Xpresso node using the python collections.Counter() module method) before being repelled away.

dynamic enzymes aggressive noise type

early enzyme attack iteration with water following

Hydrolysis Orchestration
Here is the main event. Normally aggrecanase, the enzyme we're showcasing, is synthesized in the chondrocyte and released to assist in the cleavage of aggrecan from the hyaluronan threads. This is so the pericellular matrix can be remodeled as the cartilage shifts around in a growth phase or for other maintenance reasons. But this time, it's for not so great reasons, for pathological reasons… osteoarthritis. While this aggrecan will be released into the ECM and eventually siphoned away into the synovial fluid, it likely won't be replaced by newly synthesized aggrecan, which means less osmotic pressure in the ECM, degrading cartilage, and less shock absorption.
As far as enzymes go, this type is a metalloproteinase, in which a Zn(2+)-activated water molecule acts in a nucleophilic attack on the carbonyl group to hydrolyze the peptide bond. This orchestration made significant use of action timers (which indicate state), Signal tags, spline-dynamics, hair colliders, hair constraints, and parent constraints to guide the water molecule and substrate into the active site without intersecting geometry, and then transfer and separate atoms and groups.
Representing the fidgetiness of molecules (B-factor vs NMA coarse grained, ENM, GNM):
These days, the most accurate and computationally feasible way of portraying the movement of atoms within a protein is something like showing the harmonic oscillations of the loops and domains of the protein by doing normal mode analysis (NMA) using an elastic network model (ENM) or Gaussian network model (GNM), which is a faster type of ENM. These models can simulate a few seconds of atomic movements. On top of this, the motion of individual atoms can be simulated and added through computationally intensive molecular dynamics (MD) simulations. However, these all-atom MD simulations are limited to modeling just milliseconds of motion. Combining the two can get you close to the B-factor, or temperature factor, which is the thermal motion measured by the x-ray crystallography.. For the next molecular animation I make, using the space-filling, CPK representation style, I think I will plan to incorporate that level of simulation. But for this animation, I suppose you could say I took a shortcut. Instead of elastic springs and dynamic simulations, I remapped the B-factor value for each atom to drive a turbulent noise displacement of each atom's position. Without the soft clamping on the upper values, the atoms would have appeared to be flying around, unbonded. So the fluctuating torsion angles of each residue and loop are not driving motion here. The residues are however rigged to a joint system at the alpha carbons strung through a spline dynamics tag, and carefully sculpted poses (of the S1' loop) were chosen for the enzyme's binding conformation with the ligand from the pdb, adjusting joints to avoid steric hindrance.

Aggrecanase with B-factor turbulence and waters circulating (but not solvating). It's not very accurate, rather, suggestive.

Complicated Failure
At first I thought I may as well try to represent the disaccharide units as accurately as possible with a joint-rigged polygonal system with steric constraints, and a series of random, directional, and time offset effectors. The sheer reduction in speed when cloned hundreds or thousands of times was obvious and foreseeable, so I suppose this was really just an exercise in curiosity, before I figured out how to make the simple hyaluronan and chondroitin sulfate disaccharide pdbs and add the b-factor turbulence hack from above.

Dancing disaccharide phantoms

Collagen VI L-System
Collagen VI, featured in the pericellular matrix of the chondrocyte, is responsible for connecting and anchoring the rest of the collagen network, the proteoglycans, and the cell. Like all the other types of collagen it consists at base level of a right-handed triple helix of polyproline II helical polypeptides, called α-chains. These join into dimers and then tetramers through non covalent forces. The linking and branching microfibrillar structure of collagen VI seemed like it would be an interesting design target for an L-System. So I did that, and boy was it a nice, complex series of rules with branching probabilities, custom user data, trig functions, and different domains being placed by the c4d cloners at different symbols… except it turns out that type VI collagen… does not branch, which I recently discovered, having done further reading for this project summary. The photos (electron micrographs?) certainly made it look like they did though. Lesson here… well let's be honest, the col VI was such a background element that crisscrossing single strands would have looked practically the same. And I got pretty good practice at L-Systems too, despite the inaccurate branching assumptions.
.
Procedural textures for all but macro shots
Nearly every bit of surface geometry that isn't a close-up shot was a candidate for adding major detail through texture in this piece. Whether is animated noise-driven displacement on shaders for the background aggrecan, hyaluronan or enzyme geometry, or using the king of procedural textures: Substance Designer. It's built for creating complicated, layered shapes within a material shader, which let me recess alternating triple helix collagen fibrils into the fiber bundle, or place a collagenous layer over the superficial zone cells of the articular cartilage, or create entire basement membrane fiber network layers for the chondrocytes. I was even able to recreate a highly displaced background collagen fiber nest for the chondrocyte when the camera entered the pericellular matrix. This saved tons of rendering time that would have been dedicated to sampling the hundreds of textured spline-based (render-time) tube geometry from the previous shot. When textures can replace complex or repetitive geometry, it's probably worth the time jumping into Substance.
articular cartilage
articular cartilage
basement membrane
basement membrane
collagen type II
collagen type II
collagen II lacuna environment
collagen II lacuna environment
A few takeaways
Above all, new technologies, especially AI technologies accessible through the web make getting complete protein assemblies now possible in more and more cases. Less time will be spent implementing novel techniques in building out geometry thanks to AlphaFold. This means more time can be spent on rigging the molecules, if you plan to show enzymatic reactions and the conformational changes of the enzyme and/or substrate. Automated rigging of molecules using Python scripts just got that much easier with the advent SDK/API aware LLMs like OpenAI chatGPT-4 (although none seem capable of writing a new script on their own without errors yet). While rigging won't result in the experimental accuracy of a pure simulation, it does result in something that I think looks better, and will after all be art directable (simulations can always be there for reference). And you can of course go overboard with the rigging too. But in most cases, using procedural texturing (with displacement, possibly animated if it's not that far from camera), the versatile hair object, and sometimes even L-systems, should be the go to method for filling in background elements. That is if project speed and viewport navigation is a priority. And why wouldn't it be? - unless you are just making a still render rather than an animation. 
Please let me know if you've noticed any mistakes or oversights. Or for any clarifications: geoff@picodesic.com