With the rapid proliferation of Internet technologies and applications, misuse of online messages for inappropriate or illegal purposes has become a major concern for society. We also discuss evaluation methodologies and criteria for authorship attribution studies and list open questions that will attract future work in this area. The focus of this survey is on computational requirements and settings rather than on linguistic or literary issues. In this article, a survey of recent advances of the automated approaches to attributing authorship is presented, examining their characteristics for both text representation and text classification. The plethora of available electronic texts (e.g., e-mail messages, online forum messages, blogs, source code, etc.) indicates a wide variety of applications of this technology, provided it is able to handle short and noisy text from multiple candidate authors. © 2009 Wiley Periodicals, Inc.Īuthorship attribution supported by statistical or computational methods has a long history starting from the 19th century and is marked by the seminal study of Mosteller and Wallace (1964) on the authorship of the disputed “Federalist Papers.” During the last decade, this scientific field has been developed substantially, taking advantage of research advances in areas such as machine learning, information retrieval, and natural language processing. For each variant, it is shown how machine learning methods can be adapted to handle the special challenges of that variant. In the third variant, the verification problem, there is no closed candidate set but there is one suspect in this case, the challenge is to determine if the suspect is or is not the author. In the second variant, the needle-in-a-haystack problem, there are many thousands of candidates for each of whom we might have a very limited writing sample. ![]() In the first variant, the profiling problem, there is no candidate set at all in this case, the challenge is to provide as much demographic or psychological information as possible about the author. Thus, following detailed discussion of previous work, three scenarios are considered here for which solutions to the basic attribution problem are inadequate. Real-life authorship attribution problems, however, typically fall short of this ideal. Nevertheless, most of this work suffers from the limitation of assuming a small closed set of candidate authors and essentially unlimited training text for each. Statistical authorship attribution has a long history, culminating in the use of modern machine learning classification methods. We also find that using a more modern Bible translation in the dataset has a positive effect on the task. We achieve relatively good results with shorter context lengths, whereas longer context lengths decreased model accuracy. We affirm that transfer learning has a noticeable improvement in the model accuracy. Furthermore, we also measure the model accuracies with different answer context lengths and different Bible translations. We pre-train our models on a large-scale QA dataset, SQuAD, and investigate the effect of transferring weights on model accuracy. For this purpose, we create a new dataset BibleQA based on bible trivia questions and propose three neural network models for our task. This paper studies the answer sentence selection task in the Bible domain and answer questions by selecting relevant verses from the Bible. However, domain-specific QA remains a challenge due to the significant amount of data required to train a neural network. Question answering (QA) has significantly benefitted from deep learning techniques in recent years. ![]() The purpose of this study is to protect intellectual property and ideas, as well as the results to improve better performance and level of accuracy in detecting plagiarism. Technically, this algorithm will compare and analyze the compatibility of words and sentences in documents with other document databases so that the analysis becomes an evaluation material, prediction, and determination that the document is plagiarism or not. In This article, approaches and methods for detecting plagiarism use machine learning techniques, where machine learning is empowered to become an algorithm as construction and evaluation in detecting plagiarism. This is attempted to be completed by the computer system with new approaches to detect and predict the existence of plagiarism in research automatically. With the ease of the resources obtained, the more open the opportunity to bring up a problem called Plagiarism. One of the modern problems that occur in the current research and publication process is the is duplication of the results of other people's research that is presented again by other parties.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |