Skip to content

cainesap/sino-nlp

Repository files navigation

NLP for Chinese

===

As used for the Weibo age profiling task reported at the Language Resources & Evaluation Conference 2016 (Zhang, Caines, Alikaniotis & Buttery, 'Predicting author age from Weibo microblog posts')

Rscripts

normTextExtractFeatures.R
  • normalises Weibo posts and extracts linguistic / non-linguistic features in the process;
  • requires pre-obtained Weibo files: ours were rows of users, columns of posts, Excel files;
  • requires the resources listed below;
  • look for 'CHECK PATHS' comments where you should adapt filepaths to your filesystem accordingly
segmentTagWeiboPosts.R
  • passes normalised texts to Stanford NLP word segmenter and part-of-speech tagger;
  • requires (free) download of Stanford NLP segmenter and pos-tagger from here;
  • look for 'CHECK PATHS' comments where you should adapt filepaths to your filesystem accordingly

Resources

About

NLP for Chinese

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages