Skip to Main content Skip to Navigation
Conference papers

Genetic Programming over Spark for Higgs Boson Classification

Abstract : With the growing number of available databases having a very large number of records, existing knowledge discovery tools need to be adapted to this shift and new tools need to be created. Genetic Programming (GP) has been proven as an efficient algorithm in particular for classification problems. Notwithstanding, GP is impaired with its computing cost that is more acute with large datasets. This paper, presents how an existing GP implementation (DEAP) can be adapted by distributing evaluations on a Spark cluster. Then, an additional sampling step is applied to fit tiny clusters. Experiments are accomplished on Higgs Boson classification with different settings. They show the benefits of using Spark as parallelization technology for GP.
Document type :
Conference papers
Complete list of metadatas

Cited literature [19 references]  Display  Hide  Download

https://hal-univ-paris10.archives-ouvertes.fr/hal-02286136
Contributor : Sana Ben Hamida <>
Submitted on : Tuesday, September 17, 2019 - 4:58:15 PM
Last modification on : Friday, October 23, 2020 - 7:02:01 PM

File

HiggsSpark (1).pdf
Files produced by the author(s)

Identifiers

Citation

Hmida Hmida, Sana Ben Hamida, Amel Borgi, Marta Rukoz. Genetic Programming over Spark for Higgs Boson Classification. 22nd International Conference Business Information Systems, Jun 2019, Seville, Spain. pp.300-312, ⟨10.1007/978-3-030-20485-3_23⟩. ⟨hal-02286136⟩

Share

Metrics

Record views

168

Files downloads

180