Empirical tests of the activity of compounds on biological systems are expensive, even for simple tests, and as test complexity increases the costs go up. This implies that the choice of which empirical tests to run should be made carefully, exploiting all available information to identify which tests are most likely to return positive. Thus, even at the early stages of drug development, computational methods can provide valuable insight into which tests to make, leveraging historical experimental information to reduce costs. However, such methods are computationally very expensive. Our research focuses on enabling efficient scaling of relevant machine learning methods on HPC substrates for this problem, and to show where algorithmic advances are needed to tackle large, industrially relevant data sets.


Comments are closed.