Скачать книгу

of binding to the protein with appreciable affinity is estimated by an analysis of the interactions it may form with the protein [38–41].

      2 Shape‐based screening where the 3D shape of the ligand is compared with the shape of other known active ligands for the protein [42–45].

      3 Ligand‐based methods where a machine learning algorithm using the 2D structures of known active molecules learns in the training of the model to identify other molecules likely to exhibit affinity based on this earlier provided information [46, 47].

      It should be further mentioned that hybrid methods blurring the lines between these different approaches to virtual screening are under active development, and the above categories are meant more to orient the reader to this body of work, rather than to provide a rigid taxonomy of methods [48]. Once a virtual screen has been performed, much like a traditional experimental screen, putative active molecules will be purchased or synthesized and then assayed to confirm their activity using lower‐throughput and more reliable experimental techniques. A successful virtual screening campaign will provide a diverse selection of hit molecules exhibiting affinity that will provide good starting points for the discovery project to initiate hit‐to‐lead efforts.

      Rather than solely present experimental screening and virtual screening to find hit molecules as competing techniques, we also wish to highlight that hybrid methods using both experimental techniques and the broad toolkit provided by modern computational methods are being investigated to unlock challenging targets. For example, multiple groups have been exploring if the success rates of experimental DNA‐encoded library screens can be enhanced through application of machine learning technologies [49]. Likewise, free energy calculations of fragment linking might be able increase the number of action hits identified in experimental fragment‐based screens [50]. We anticipate further development of such mixed computational/experimental approaches to hit finding to be a very productive future research direction.

      2.2.3 Hit‐to‐lead and Lead Optimization

      Once developable hits have been identified, a central goal for the project team will be to synthesize and characterize sets of congeneric compounds with developable structure–activity relationships and demonstrated in vivo efficacy, i.e. a lead series. And, once a lead series is identified, the project team will work to further identify a development candidate molecule within that lead series manifesting the property profile required for that molecule to be advanced with confidence into preclinical development and ultimately clinical trials. In practice, this can be slow, expensive, and pain‐staking work where many small chemical modifications of the initial hit molecules are synthetically explored and profiled in a variety of in vitro and in vivo experimental assays to facilitate the project team learning what combination of modifications to the initial hits might lead to a molecule manifesting the desired property profile.

      A major step forward has been the development of advanced computational methods to explicitly enumerate many of the chemical modifications a project team might consider to improve the chemical matter and to further accurately score the properties of these ideated molecules. The enumeration of project relevant design ideas can be performed by way of a variety of methods including classical rules based approaches, as well as more modern machine‐learning and deep‐learning based techniques, and hybrid approaches [56–63]. Of equal importance to the ability to enumerate such synthesizable and project‐relevant design ideas is the ability to accurately evaluate their properties to facilitate the prioritization of this pool of molecules for synthesis. There are three major classes of scoring methods in common use:

      1 Ligand‐based methods where a machine learning algorithm is provided the 2D structures of molecules with experimentally measured properties to train a model that can be used to predict the properties of other molecules based on this earlier provided information [64–67].

      2 Approximate physics‐based methods, such as MM‐GB/SA interaction energy analysis and molecular docking, in which the likelihood of the molecule manifesting a particular property of interest will be estimated on the basis of the interactions it may form with its environment [39–41,68–75].

      3 Rigorous physics‐based methods, such as free energy calculations, where all atomistic contributions to the property of interest will be explicitly simulated [76–82].

      Each of these scoring methods have different strengths and weaknesses and attempt to balance prediction accuracy, computational efficiency, and breadth of applicability.

      In contrast to ligand‐based methods, approximate physics‐based methods, such as molecular docking and MM‐GB/SA scoring, do not require the construction of training sets and explicit parametrization, and instead proceed by way of an energetic analysis or three‐dimensional contact analysis to estimate whether the molecule under consideration for synthesis might achieve the discovery project design goals. This class of methods has been most extensively developed for protein‐ligand binding affinity analysis, and can be utilized to exclude molecules that are grossly sterically or electrostatically incompatible with the ligand binding site of the protein of interest, thereby enriching the quality of molecules under consideration by the discovery project team [68, 69]. This class of methods has also been adapted to address other ADMET properties such as membrane permeability, human serum albumin (HSA) binding, and hERG blockade, among others [86–90]. The computational efficiency of these methods is typically much lower and therefore more computationally expensive than ligand‐based approaches, and will often require seconds to minutes of CPU time. A key disadvantage of this approach is that quantitative prediction accuracy is typically not possible, and instead the methods will typically be used to triage large sets of molecules under consideration for further analysis, either by the project team members, or by way of more sophisticated computational techniques.

      Rigorous physics‐based methods are distinguished from the earlier approximate methods by the explicit theoretically robust connection that can be made between the simulations performed and the experimentally measured quantities of interest. The benefit of this clear theoretical connection is that such methods can be quite accurate in practice, often with root‐mean‐square errors of ~1 kcal/mol, and do not require any system‐specific parametrization [76]. Such calculation accuracy enables these methods to more robustly support discovery project decision making than would be possible with less accurate or less reliable techniques. This improved accuracy and reliability though comes at the price of a greatly increased computational cost, typically ~1 GPU day of simulation time per scored molecule. As with the earlier

Скачать книгу