In this issue:
Welcome back to your weekly dose of AI news for Life Science!
This week, we have some exciting new models lined up for you:
DPLM-2 - A Multimodal Diffusion Protein Language Model 💊
MedEmbed: Embedding for medical and clinical information retrieval 🚀
KinDEL: Public dataset poised to accelerate AI-driven drug discovery 💿
OncoGAN: Synthetic tumor genome generation 🦀
DiffModeler: Macromolecular structure modelling for cryo-EM map 🧬
Dive into these game-changing innovations and explore how they are transforming the biotech and healthcare landscapes!
DPLM-2: A Multimodal Diffusion Protein Language Model 💊
DPLM-2 is a groundbreaking multimodal protein model that unifies sequence and structure generation by converting 3D coordinates into discrete tokens. It excels in various tasks, providing a unified approach to protein modelling without separate sequence and structure stages.
📌 Key Insights:
Beats other models across like RFDiffusion, FoldFlow, ESM3 across multiple benchmarks
Simultaneous support for both protein sequence and structure … and a permissive Apache license
DPLM-2 can be used for multiple tasks:
Unconditional protein generation
Conditional protein generation
Folding
Inverse folding
Classification/Regression on downstream tasks such as HumanPPI, Thermostability and Metal Ion Binding
MedEmbed - Embedding for medical and clinical information retrieval 🚀
Medical information is inherently complex, characterised by i) Specialised Terminology ii) Contextual Nuances and iii) Evolving Knowledge. MedEmbed was developed to overcome these issues and open up a a lot of real-world applications for GenAI, such as improved clinical decisions and EHR data extraction
📌 Key Insights:
Strong training with 1000s of PubMed Central clinical/medical notes, 10,000s of synthetic pairs using LLMs and 100,000s of training triplets with hard negative mining!
MedEmbed beats most general-purpose models across multiple benchmarks
A permissive Apache 2.0 license allowing for commercial use
KinDEL - Public DNA-Encoded Library dataset 💿
Insitro released KinDEL, one of the few publicly available raw DNA-Encoded Library (DEL) datasets. DEL are a powerful tool in drug discovery, enabling efficient screening of small molecule libraries against therapeutically relevant targets.
📌 Key Insights:
80 million small molecules tested against two kinase targets, MAPK14 and DDR1
The data is coupled with the release of set of benchmark including DEL-Compose
OncoGAN: Synthetic tumor genome generation 🦀
One of the main the limitations for training models is data accessability. There is never enough data. This year we have witnessed a lot of models and tools to generate data across different domains. Today we introduce OncoGAN, one of the first in silico generation cancer genomes that captures mutational diversity of true cancer genomes.
📌 Key Insights:
Generated 800 simulated highly realistic cancer genome (in VCFs format) — now freely available for download.
Support for 8 cancer types across prostate, liver and breast
OncoGAN excels at mincing tumour heterogeneity and tissue-specific mutational patterns
DiffModeler - Macromolecular structure modelling for cryo-EM map 🧬
Another week and another model for protein prediction! Today we present DiffModeler, a diffusion model that combines single chain proteins predicted by AlphaFold2 and Cryo-EM data to build full protein complex structures with a high resolution of 0-20A!
Did you find this newsletter insightful? Share it with a colleague!
Subscribe Now to stay at the forefront of AI in Life Science.
Connect With Us
Have questions or suggestions? We'd love to hear from you!
📧 Email Us | 📲 Follow on LinkedIn | 🌐 Visit Our Website