Feature chemical models of prediction and generation support with string representation. #49

linjing-lab · 2024-11-23T16:20:04Z

Notice tracel-ai from burn framework, this software must substitute to high performance predictions, like robotics, predict from data lake. Some molecular pretrained models use RoBERTa as base model, like ChemBERTa, ChemBERTa-2, MFBERT, SELFormer, Semi-RoBERTa. Some protein pretrained models use RoBERTa as base model, like ESM-1b, ESM-2, PromptProtein, KeAP. Those are encode-only tasks which compatible with models from tracel-ai from the inference performance perspective, recommend models provide burn-based multi-strings examples for molecules, proteins, genomics, and multi-modal level sets.

This repository has CRAFT model which may used in Structure-based task, but wasn't clear enough in reality design, like MolCRAFT of continuous parameter space for drug design. Clear chemical compatibility was constrained with maximized purpose of script character interpretation, not only abstract design for kind machine schedules. Abstract interpretation can always export new distributed abstract operators, which reflect machine memory and times, think tracel-ai features more decode tasks, and seek low memory from correlation when string to continuous space. Now multi-objective and chemical prediction happens in one possible history, from explanation, to distributed stream pattern.

antimora · 2024-11-25T17:03:38Z

@linjing-lab. Thanks for you the feedback. So we understand your request, can you confirm the following issue description is accurate? This is my interpretation of your issue raised.

Title: Support for Chemical and Biological Sequence Models Utilizing String Representations

Description:

To enhance the repository's applicability in cheminformatics and bioinformatics, it is proposed to integrate models capable of processing chemical and biological sequences represented as strings. This includes handling molecular structures via SMILES (Simplified Molecular Input Line Entry System) and protein sequences through amino acid representations.

Proposed Enhancements:

Incorporate Molecular Models:
- Develop and include models similar to ChemBERTa, ChemBERTa-2, MFBERT, SELFormer, and Semi-RoBERTa, which are based on the RoBERTa architecture and designed for molecular data processing.
Integrate Protein Sequence Models:
- Add models akin to ESM-1b, ESM-2, PromptProtein, and KeAP, which utilize RoBERTa for protein sequence analysis.
Enhance Existing Models:
- Refine the current CRAFT model to improve its design clarity and functionality, enabling support for continuous parameter spaces in drug design, similar to MolCRAFT.

Objective:

These enhancements aim to broaden the repository's utility in fields such as drug discovery and genomics by providing high-performance models built with the Burn framework, capable of efficient inference on molecular and protein sequence data.

linjing-lab · 2024-11-26T08:52:11Z

From system execution, unified pipelines connected database and its concrete analysis, downstream alignments with checkpoints. Compiled model of specific usages now served as new end-to-end checkpoint from simple daily distributions. Enhancements are trying to explain interpretable models from capable training, and efficiently deploy clarity and functionality in inference. Statistical collections always note latent contributions in datasets, which served for easy alignments in quoting data and mapping rules, real selective models from which control datasets on a latent growth. Think rust pretrained models need to apply in scale from system sequences, search from all reality topics activates self applications in continuously abstractive rules.

wangjiawen2013 · 2024-12-04T01:31:32Z

I am a bioinformatician and hope the same improvements!

linjing-lab · 2024-12-09T07:58:47Z

From reality control, statistical analysis of humanity behaviors force provides efficient reinforced conditions for mobile devices in sensing nerve endings within model rewards, which reward to powerful reality behaviors within relaxation of brain system. That's now Health AI concentrated on which human need and its medical data produced devices assistant within self activated model intelligences from reinforcement reward in relaxing long distance neural transport with statistical positive effects. The preparation obtained from frequency of reflection system aims to achieve briefly encode and simple convergence conditions for macroscopic healthy conformations, like running in adjusting respiratory intake from reward foot sense and its brain force balancing with body breath, gender identity checkpoint, foot-to-head reward nerves.

linjing-lab mentioned this issue Dec 16, 2024

Feature reality models similar to exist LLM, like medical target expectations. NVIDIA/JAX-Toolbox#1202

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature chemical models of prediction and generation support with string representation. #49

Feature chemical models of prediction and generation support with string representation. #49

linjing-lab commented Nov 23, 2024

antimora commented Nov 25, 2024 •

edited

Loading

linjing-lab commented Nov 26, 2024

wangjiawen2013 commented Dec 4, 2024

linjing-lab commented Dec 9, 2024

Feature chemical models of prediction and generation support with string representation. #49

Feature chemical models of prediction and generation support with string representation. #49

Comments

linjing-lab commented Nov 23, 2024

antimora commented Nov 25, 2024 • edited Loading

linjing-lab commented Nov 26, 2024

wangjiawen2013 commented Dec 4, 2024

linjing-lab commented Dec 9, 2024

antimora commented Nov 25, 2024 •

edited

Loading