A new adaptive L1-norm for optimal descriptor selection of high-dimensional QSAR classification model for anti-hepatitis C virus activity of thiourea derivatives

Z. Y. Algamal, M. H. Lee

Research output: Contribution to journalArticle

  • 2 Citations

Abstract

A high-dimensional quantitative structure–activity relationship (QSAR) classification model typically contains a large number of irrelevant and redundant descriptors. In this paper, a new design of descriptor selection for the QSAR classification model estimation method is proposed by adding a new weight inside L1-norm. The experimental results of classifying the anti-hepatitis C virus activity of thiourea derivatives demonstrate that the proposed descriptor selection method in the QSAR classification model performs effectively and competitively compared with other existing penalized methods in terms of classification performance on both the training and the testing datasets. Moreover, it is noteworthy that the results obtained in terms of stability test and applicability domain provide a robust QSAR classification model. It is evident from the results that the developed QSAR classification model could conceivably be employed for further high-dimensional QSAR classification studies.

Original languageEnglish
Pages (from-to)75-90
Number of pages16
JournalSAR and QSAR in Environmental Research
Volume28
Issue number1
DOIs
StatePublished - 2 Jan 2017

Fingerprint

Thiourea
Quantitative Structure-Activity Relationship
Hepacivirus
Dental Facilities
Anthralin
Cross Circulation
Bulimia
Thioureas
Viruses
Derivatives
Datasets
Bibliography of Medicine
Blood Stains
Testing

Keywords

  • classification
  • lasso
  • penalized logistic regression
  • penalized method
  • QSAR

ASJC Scopus subject areas

  • Bioengineering
  • Molecular Medicine
  • Drug Discovery

Cite this

@article{ed8a2acb3bc84ae6ac12c156eb483c53,
title = "A new adaptive L1-norm for optimal descriptor selection of high-dimensional QSAR classification model for anti-hepatitis C virus activity of thiourea derivatives",
abstract = "A high-dimensional quantitative structure–activity relationship (QSAR) classification model typically contains a large number of irrelevant and redundant descriptors. In this paper, a new design of descriptor selection for the QSAR classification model estimation method is proposed by adding a new weight inside L1-norm. The experimental results of classifying the anti-hepatitis C virus activity of thiourea derivatives demonstrate that the proposed descriptor selection method in the QSAR classification model performs effectively and competitively compared with other existing penalized methods in terms of classification performance on both the training and the testing datasets. Moreover, it is noteworthy that the results obtained in terms of stability test and applicability domain provide a robust QSAR classification model. It is evident from the results that the developed QSAR classification model could conceivably be employed for further high-dimensional QSAR classification studies.",
keywords = "classification, lasso, penalized logistic regression, penalized method, QSAR",
author = "Algamal, {Z. Y.} and Lee, {M. H.}",
year = "2017",
month = "1",
doi = "10.1080/1062936X.2017.1278618",
volume = "28",
pages = "75--90",
journal = "SAR and QSAR in Environmental Research",
issn = "1062-936X",
publisher = "Taylor and Francis Ltd.",
number = "1",

}

TY - JOUR

T1 - A new adaptive L1-norm for optimal descriptor selection of high-dimensional QSAR classification model for anti-hepatitis C virus activity of thiourea derivatives

AU - Algamal,Z. Y.

AU - Lee,M. H.

PY - 2017/1/2

Y1 - 2017/1/2

N2 - A high-dimensional quantitative structure–activity relationship (QSAR) classification model typically contains a large number of irrelevant and redundant descriptors. In this paper, a new design of descriptor selection for the QSAR classification model estimation method is proposed by adding a new weight inside L1-norm. The experimental results of classifying the anti-hepatitis C virus activity of thiourea derivatives demonstrate that the proposed descriptor selection method in the QSAR classification model performs effectively and competitively compared with other existing penalized methods in terms of classification performance on both the training and the testing datasets. Moreover, it is noteworthy that the results obtained in terms of stability test and applicability domain provide a robust QSAR classification model. It is evident from the results that the developed QSAR classification model could conceivably be employed for further high-dimensional QSAR classification studies.

AB - A high-dimensional quantitative structure–activity relationship (QSAR) classification model typically contains a large number of irrelevant and redundant descriptors. In this paper, a new design of descriptor selection for the QSAR classification model estimation method is proposed by adding a new weight inside L1-norm. The experimental results of classifying the anti-hepatitis C virus activity of thiourea derivatives demonstrate that the proposed descriptor selection method in the QSAR classification model performs effectively and competitively compared with other existing penalized methods in terms of classification performance on both the training and the testing datasets. Moreover, it is noteworthy that the results obtained in terms of stability test and applicability domain provide a robust QSAR classification model. It is evident from the results that the developed QSAR classification model could conceivably be employed for further high-dimensional QSAR classification studies.

KW - classification

KW - lasso

KW - penalized logistic regression

KW - penalized method

KW - QSAR

UR - http://www.scopus.com/inward/record.url?scp=85011872861&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85011872861&partnerID=8YFLogxK

U2 - 10.1080/1062936X.2017.1278618

DO - 10.1080/1062936X.2017.1278618

M3 - Article

VL - 28

SP - 75

EP - 90

JO - SAR and QSAR in Environmental Research

T2 - SAR and QSAR in Environmental Research

JF - SAR and QSAR in Environmental Research

SN - 1062-936X

IS - 1

ER -