• Temitope O. Atoyebi Department of Computer Science, Nasarawa State University Keffi, Nasarawa, Nigeria
  • Rashidah F. Olanrewaju Department of Computer Science, Nasarawa State University Keffi, Nasarawa, Nigeria
  • N. V. Blamah Department of Computer Science, Nasarawa State University Keffi, Nasarawa, Nigeria
Keywords: Multinomial Naïve Bayes, Random Forest, Proposed System, Gaussian Naïve Bayes, Malaria Outbreak


Malaria, a potentially fatal condition brought on by Plasmodium parasites, continues to be a major worldwide health concern. Millions of individuals across the globe, especially in tropical and subtropical areas, are afflicted by the illness, which is spread by the bites of female Anopheles mosquitoes that have been infected. The malaria incidence classification model is an early detection mechanism that helps to monitor the spread of malaria; it is a unique data-driven knowledge discovery system that will assist public health authorities in learning the effects of environment/location factors on health and also in developing relevant preventive and adaptive mechanisms to ensure a timelier health service to save lives. Despite these investments and some other eradication strategies initiated by the WHO, malaria incidence still shows an increasing trend in Sub-Saharan Africa. Malaria disease is transmitted to humans through the bite of female
mosquitoes (main vector) of the genus Anopheles. These vectors feed on human blood for their egg production. However, there is a need for better models with improved prediction ability based on seasonal and non-seasonal variations in the environment. This research proposes a machine learning-based model for the classification of malaria incidence using environmental features across Nigeria, Africa over a period of five years. The work commences with a feature engineering
process, which identifies the environmental factors that affect the incidence of malaria, followed by the random forest process for outlier detection, and then, a Multinomial Naïve Bayes algorithm for classification. The results suggest that although the exact association between malaria incidence and environment variability varies from one geographic region to another, the seasonal changes also contributed (rainy season and decrease in temperature) significantly to the outbreak of malaria. The proposed system was compared with other classification models, and the comparative results showed that the proposed system (Multinomial Naive Bayes) outperformed other classification (Random Forest and GNB) models.