home / preprints / preprints_ui

preprints_ui: 7yb4c_v1

Denormalized preprint data with contributors and subjects for efficient UI access

Data license: ODbL (database) & original licenses (content) · Data source: Open Science Framework

This data as json, copyable

id title description date_created date_modified date_published original_publication_date publication_doi provider is_published reviews_state version is_latest_version preprint_doi license tags_list tags_data contributors_list contributors_data first_author subjects_list subjects_data download_url has_coi conflict_of_interest_statement has_data_links has_prereg_links prereg_links prereg_link_info last_updated
7yb4c_v1 Ben-Sarc: A Corpus for Sarcasm Detection from Bengali Social Media Comments and Its Baseline Evaluation Sarcasm detection research of the Bengali language so far can be considered to be narrow due to the unavailability of resources. In this paper, we introduce a large-scale self annotated Bengali corpus for sarcasm detection research problem in the Bengali language named ’Ben-Sarc’ containing 25,636 comments, manually collected from different public Facebook pages and evaluated by external evaluators. Then we present a complete strategy to utilize different models of traditional machine learning, deep learning, and transfer learning to detect sarcasm from text using the Ben-Sarc corpus. Finally, we demonstrate a comparison between the performance of traditional machine learning, deep learning, and transfer learning models on our Ben-Sarc corpus. Transfer learning using Indic-Transformers Bengali BERT as a pre-trained source model has achieved the highest accuracy of 75.05%. The second highest accuracy is obtained by the LSTM model with 72.48% and Multinomial Naive Bayes is acquired the third highest with 72.36% accuracy for deep learning and machine learning, respectively. The Ben-Sarc corpus is made publicly available in the hope of advancing the Bengali Natural Language Processing community. 2022-01-17T14:30:22.705261 2022-03-01T18:52:19.286313 2022-01-17T15:43:24.658213     engrxiv 0 withdrawn 1 1 https://doi.org/10.31224/osf.io/7yb4c CC-By Attribution 4.0 International Bengali sarcasm; Bengali sarcasm detection; sarcasm; sarcasm detection ["Bengali sarcasm", "Bengali sarcasm detection", "sarcasm", "sarcasm detection"] Sanzana Karim Lora; G. M. Shahariar; Tamanna Nazmin; Noor Nafeur Rahman; Rafsan Rahman; Miyad Bhuiyan; Faisal Muhammad shah [{"id": "738rd", "name": "Sanzana Karim Lora", "index": 0, "orcid": "0000-0001-6647-1639", "bibliographic": true}, {"id": "b2xs5", "name": "G. M. Shahariar", "index": 1, "orcid": "0000-0001-9757-7663", "bibliographic": true}, {"id": "2seyf", "name": "Tamanna Nazmin", "index": 2, "orcid": null, "bibliographic": true}, {"id": "dbqpt", "name": "Noor Nafeur Rahman", "index": 3, "orcid": null, "bibliographic": true}, {"id": "hrjpv", "name": "Rafsan Rahman", "index": 4, "orcid": null, "bibliographic": true}, {"id": "mhc9v", "name": "Miyad Bhuiyan", "index": 5, "orcid": null, "bibliographic": true}, {"id": "dc4qy", "name": "Faisal Muhammad shah", "index": 6, "orcid": "0000-0002-5118-8571", "bibliographic": true}] Sanzana Karim Lora Engineering; Computer Engineering; Other Computer Engineering [{"id": "5994df7a54be8100732d43ae", "text": "Engineering"}, {"id": "5994df7a54be8100732d43ba", "text": "Computer Engineering"}, {"id": "5994df7a54be8100732d43c0", "text": "Other Computer Engineering"}]   0   no no []   2025-04-09T20:03:59.488268
Powered by Datasette · Queries took 1.383ms · Data license: ODbL (database) & original licenses (content) · Data source: Open Science Framework