Text Selection Process

Read Complete Research Material

Text Selection Process

Investigate Text Selection Process



Table of Contents

Chapter I4

Introduction4

Background Information9

Purposes of the Action Research Project13

Research Questions14

Identification of Sources of Data15

Chapter II16

Naïve Bayes and SVM classifiers16

Stemming18

The role of stopwords19

Statistical feature selection20

Classification evaluation methods21

Chapter III23

The Methodological Design of the Action Research23

The Demographics and Setting of the Project23

Plan for Communication24

Anticipated Problems27

Proposal Timeline27

Chapter IV30

Experiment 1: document representation model selection30

Experiment 2: using stopwords as feature sets30

Experiment 3: stemming31

Experiment 4: statistical feature selection32

Experiment 5: learning curve and confidence curve33

Chapter IV35

Conclusion35

Investigate Text Selection Process

Chapter I

Introduction

Text classification is a typical scholarly activity in literary study (Unsworth? 2000; Yu and Unsworth? 2006). Humanist scholars organize and study literary texts according to various classification criteria? such as topics? authors? styles? and genres. For decades computational analysis tools have been used in some literary text classification tasks? such as authorship attribution (Mosteller and Wallace? 1964; Holmes? 1994) and stylistic analysis (Holmes? 1998). Recently? with the development of machine learning and natural language processing techniques? automatic text classification methods1 provide new approaches to more literary text analysis problems (Argamon and Olsen? 2006); for example? discriminant analysis and cross entropy classification for authorship attribution and stylistic analysis (Craig? 1999; Juola and Bayyen? 2005)? decision tree classification for genre analysis of Shakespeare's plays (Ramsay? 2004)? SVM classification for knowledge class assignment of the Encyclopédie entries (Horton et al.? 2007)? naïve Bayes classification for the eroticism analysis of Dickinson's poems (Plaisant et al.? 2006)? and naïve Bayes classification for sentimentalism analysis of early American novels (Horton et al.? 2006).

With the availability of so many text classification methods? empirical evaluation is important to provide guidance for method selection in literary text classification applications. A number of studies have evaluated popular classification algorithms on a few benchmark data sets (Dumais et al.? 1998; Joachims? 1998; Yang and Liu? 1999). However? these benchmark data sets were limited to news and web documents? which have different characteristics from the creative writings in literature. Moreover? in these evaluation studies? all methods were tested on topic classification tasks. In the setting of literary text classification? text documents are categorized by many document properties other than topics. Some target classes? such as authors and genres? are defined in an objective manner? while other classes? such as the sub-genres 'eroticism' and 'sentimentalism'? are subjectively defined by the groups of scholars in these particular fields of study. Prediction is the common purpose of scientific classifiers. Hence? classifiers are usually evaluated by the measure of classification accuracy. However? improving classification accuracy is seldom the goal for literary scholars. High classification accuracy provides evidence that some patterns have been inferred to separate the classes. The scholars are more interested in the literary knowledge as represented by these linguistic patterns. In other words? the usual purpose of literary classification is to seek suggestive evidence for further scholarly investigation of what texture features characterize the target classes (Ramsay? 2008). Sometimes scholars would also like to have classifiers as example-based retrieval tools to find more documents of a certain kind? such as ekphrastic poems2 and historicist catalog poems3 (Yu ...
Related Ads
  • Selection Process
    www.researchomatic.com...

    Selection Process, Selection Process Essay writing h ...

  • Emr
    www.researchomatic.com...

    At the time of our selection process , attenda ...

  • What Dangers Are There Fo...
    www.researchomatic.com...

    E-mail, text messaging, PDA's, cell telep ...

  • Muffler Magic
    www.researchomatic.com...

    Specify three recommendations about the functions of ...

  • Employee Screening
    www.researchomatic.com...

    The screening is selection process is very im ...