Description
|
This dataset contains sets of news article segments in English related to three domains namely animal health, food security and climate change and has been used to fine-tune and evaluate GPT3.5-turbo, GPT4o, DeepSeek-V3, DeBERTa, RoBERTa, BERT, EpidBioELECTRA and EpidGPT models for novelty detection tasks in the three domains. It is composed of 10,660 animal disease, 1,100 food security and 2,200 climate change article segments in csv format with information about the parent articles (segment, doc id, seg id, title, source url, publication date, article domain, article subdomain).
Animal health domain is made up of 22 subdomains inclusive of article segments on Avian Influenza (AI) 2310, Highly Pathogenic Avian Influenza (HPAI) 3060, African Swine Fever (ASF) 1000, Foot-and-Mouth disease (FMD) 1165, Bovine Spongiform Encephalopathy (BSE) 770, Brucellosis 435, Peste des Petits Ruminants (PPR) 160, Bluetongue 165, Newcastle disease 155, Glanders 160, Disease X 140, Anthrax 120, West Nile Virus (WNV) 145, Middle East respiratory syndrome (MERS) 215, Infectious Salmon Anaemia (ISA) 110, Equine Influenza (EI) 200, Eastern Equine Encephalitis (EEE) 50, Porcine Reproductive and Respiratory Syndrome (PRRS) 85, Rift valley Fever (RVF) 30, Classical Swine Fever (CSF) 40, Rabies 30, Venezuelan Equine Encephalitis (VEE) 55, Viral Haemorrhagic Septicaemia (VHS) 15.
Climate change domain is made up of 7 subdomains inclusive of article segments on flash floods 340, drought 394, wildfires 184, hurricanes 349, heatwaves 385, global warming 299 and tsunamis 340.
The original articles dataset (corpus) contains documents from which created segments (drawn from original PADI-Web articles (relevant articles only) and those gotten from GDELT database (for news articles on food security events and climate change events)). (2025-05-02)
|