-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature chemical models of prediction and generation support with string representation. #49
Comments
@linjing-lab. Thanks for you the feedback. So we understand your request, can you confirm the following issue description is accurate? This is my interpretation of your issue raised. Title: Support for Chemical and Biological Sequence Models Utilizing String Representations Description: To enhance the repository's applicability in cheminformatics and bioinformatics, it is proposed to integrate models capable of processing chemical and biological sequences represented as strings. This includes handling molecular structures via SMILES (Simplified Molecular Input Line Entry System) and protein sequences through amino acid representations. Proposed Enhancements:
Objective: These enhancements aim to broaden the repository's utility in fields such as drug discovery and genomics by providing high-performance models built with the Burn framework, capable of efficient inference on molecular and protein sequence data. |
From system execution, unified pipelines connected database and its concrete analysis, downstream alignments with checkpoints. Compiled model of specific usages now served as new end-to-end checkpoint from simple daily distributions. Enhancements are trying to explain interpretable models from capable training, and efficiently deploy clarity and functionality in inference. Statistical collections always note latent contributions in datasets, which served for easy alignments in quoting data and mapping rules, real selective models from which control datasets on a latent growth. Think rust pretrained models need to apply in scale from system sequences, search from all reality topics activates self applications in continuously abstractive rules. |
Notice tracel-ai from burn framework, this software must substitute to high performance predictions, like robotics, predict from data lake. Some molecular pretrained models use RoBERTa as base model, like ChemBERTa, ChemBERTa-2, MFBERT, SELFormer, Semi-RoBERTa. Some protein pretrained models use RoBERTa as base model, like ESM-1b, ESM-2, PromptProtein, KeAP. Those are encode-only tasks which compatible with models from tracel-ai from the inference performance perspective, recommend models provide burn-based multi-strings examples for molecules, proteins, genomics, and multi-modal level sets.
This repository has CRAFT model which may used in Structure-based task, but wasn't clear enough in reality design, like MolCRAFT of continuous parameter space for drug design. Clear chemical compatibility was constrained with maximized purpose of script character interpretation, not only abstract design for kind machine schedules. Abstract interpretation can always export new distributed abstract operators, which reflect machine memory and times, think tracel-ai features more decode tasks, and seek low memory from correlation when string to continuous space. Now multi-objective and chemical prediction happens in one possible history, from explanation, to distributed stream pattern.
The text was updated successfully, but these errors were encountered: