preprints_ui: 7yb4c_v1
Data license: ODbL (database) & original licenses (content) · Data source: Open Science Framework
id | title | description | date_created | date_modified | date_published | original_publication_date | publication_doi | provider | is_published | reviews_state | version | is_latest_version | preprint_doi | license | tags_list | tags_data | contributors_list | contributors_data | first_author | subjects_list | subjects_data | download_url | has_coi | conflict_of_interest_statement | has_data_links | has_prereg_links | prereg_links | prereg_link_info | last_updated |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7yb4c_v1 | Ben-Sarc: A Corpus for Sarcasm Detection from Bengali Social Media Comments and Its Baseline Evaluation | Sarcasm detection research of the Bengali language so far can be considered to be narrow due to the unavailability of resources. In this paper, we introduce a large-scale self annotated Bengali corpus for sarcasm detection research problem in the Bengali language named ’Ben-Sarc’ containing 25,636 comments, manually collected from different public Facebook pages and evaluated by external evaluators. Then we present a complete strategy to utilize different models of traditional machine learning, deep learning, and transfer learning to detect sarcasm from text using the Ben-Sarc corpus. Finally, we demonstrate a comparison between the performance of traditional machine learning, deep learning, and transfer learning models on our Ben-Sarc corpus. Transfer learning using Indic-Transformers Bengali BERT as a pre-trained source model has achieved the highest accuracy of 75.05%. The second highest accuracy is obtained by the LSTM model with 72.48% and Multinomial Naive Bayes is acquired the third highest with 72.36% accuracy for deep learning and machine learning, respectively. The Ben-Sarc corpus is made publicly available in the hope of advancing the Bengali Natural Language Processing community. | 2022-01-17T14:30:22.705261 | 2022-03-01T18:52:19.286313 | 2022-01-17T15:43:24.658213 | engrxiv | 0 | withdrawn | 1 | 1 | https://doi.org/10.31224/osf.io/7yb4c | CC-By Attribution 4.0 International | Bengali sarcasm; Bengali sarcasm detection; sarcasm; sarcasm detection | ["Bengali sarcasm", "Bengali sarcasm detection", "sarcasm", "sarcasm detection"] | Sanzana Karim Lora; G. M. Shahariar; Tamanna Nazmin; Noor Nafeur Rahman; Rafsan Rahman; Miyad Bhuiyan; Faisal Muhammad shah | [{"id": "738rd", "name": "Sanzana Karim Lora", "index": 0, "orcid": "0000-0001-6647-1639", "bibliographic": true}, {"id": "b2xs5", "name": "G. M. Shahariar", "index": 1, "orcid": "0000-0001-9757-7663", "bibliographic": true}, {"id": "2seyf", "name": "Tamanna Nazmin", "index": 2, "orcid": null, "bibliographic": true}, {"id": "dbqpt", "name": "Noor Nafeur Rahman", "index": 3, "orcid": null, "bibliographic": true}, {"id": "hrjpv", "name": "Rafsan Rahman", "index": 4, "orcid": null, "bibliographic": true}, {"id": "mhc9v", "name": "Miyad Bhuiyan", "index": 5, "orcid": null, "bibliographic": true}, {"id": "dc4qy", "name": "Faisal Muhammad shah", "index": 6, "orcid": "0000-0002-5118-8571", "bibliographic": true}] | Sanzana Karim Lora | Engineering; Computer Engineering; Other Computer Engineering | [{"id": "5994df7a54be8100732d43ae", "text": "Engineering"}, {"id": "5994df7a54be8100732d43ba", "text": "Computer Engineering"}, {"id": "5994df7a54be8100732d43c0", "text": "Other Computer Engineering"}] | 0 | no | no | [] | 2025-04-09T20:03:59.488268 |