Guests can now reply in ALL forum topics (No registration required!)
In the Name of God بسم الله
Search the Community
Showing results for tags 'Text mining'.
Found 2 results
(bismillah) (salam) God willing, this thread will be a start in a series of threads in which I hope to apply Text Mining for the purpose of investigating various aspects of the discussion forums --- or better, e-community --- here at ShiaChat. In the aforesaid, I have highlighted the important parts, which I hope the reader too would have found important, and would be inquisitive about. I will explain them in the following. Background Text Mining: It is the Art+Science of extracting, refining, and processing textual data in order to be able to produce information and/or knowledge. 'Mining' is the metaphor used to describe the complete process since it resembles mining in a sense --- we extract data from various places (USBs, Databases, Emails, Social networking sites, etc.), refine it into a format amenable to Statistical analysis, and then process it in order to retrieve information or knowledge. Example: An organisation may have carried out a survey regarding there services/products wherein it asked various quantitative and qualitative questions. While the quantitative questions may be easily processed to derive basic information (such as how many respondents prefer product x over y), with qualitative questions, this is a bit tricky. The problem deepens when the number of responses crosses the thousand barrier, as manually reading a thousand responses is time-consuming as well as boring. Here, Text Mining could be used to provide summaries, identify groups of responses that prefer product x over y, identify why product x is preferred, etc. Investigating various aspects: Text Mining can prove useful for various purposes, such as Sentiment Analysis, Topic Mining, Text Clustering, Text Summarisation, etc. Sentiment analysis is the process of identifying the sentiments of the contributors of textual data, by analysing the text. It is of tremendous utility for organisations, as it enables them to see which aspect of their service customers are happy about, and which are those that they are dissatisfied with. Topic mining and Text Clustering are the processes of identifying topics in textual data, and identifying similar groups in textual data, respectively. Text summarisation is to produce summaries of textual data, highlighting the most important parts of the data. E-community: ShiaChat is not only a discussion forum, but also a community that functions online. Given the data about this community, various research questions can be posed that may be answerable by the techniques mentioned in the previous paragraph (summarisation, clustering, etc.). Furthermore, Social Network Analysis too can be performed on the society, by using its data. Application of Text Mining to User Interests at ShiaChat Hoping that the preceding provides a brief yet coherent introduction to Text Mining, we may proceed to ask questions that may be of interest and utility to us: We may ask how the interests of posters vary, if at all, among the various discussion forums at ShiaChat. I chose 7 of the many forums at ShiaChat: 1) General 2) Prophet (pbuh) and AhlulBayt (as) 3) Social 4) Science and Technology 5) Shia-Sunni 6) Philosophy 7) Politics I used R, which is a free software available for download here. R uses various packages for different applications, and in this case, I used the XML, Tm, Igraph, and Wordcloud packages. Without further ado, if I may present a comparison word cloud, where the 'Interests' of posters have been broken down into single words and used to form a word cloud. (Apologies, since I am receiving error500 for both JPG and PNG images, I shall have to provide a simple link that would take readers to the image, here.) The different discussion forums are coloured black to act as tags for the different parts of the comparison cloud, and words appearing in the 'interests' of posters in each discussion forum are also coloured accordingly. It can be seen, for example, that those actively commenting in the Social forum have interest in psychology, literature, cinematography, buddhism, as well as 'shapeshifting', 'misguiding', etc. The word cloud is made with the configuration that if a particular word occurs more number of times, its size relative to other words would increase too. Of course, this means that if a single user posts most often/or has been posting often, recently, in a forum, then their interests would be magnified. As well, it may mean that certain interests written in humour may be identified, and possibly misinterpreted as being serious interests. Nonetheless, what is significant is that we were able to identify interests of users without manually reading any. Other information, such as the prevalent interest of reading in the Science group, spirituality in the Philosophy group, table tennis + nasibis in the politics group, and the rest can be left unto the reader. In case the enlarged image of the comparison cloud is not clear enough, a pdf version can be downloaded too, which can be zoomed for further inspection of interests. comparisonCloudInterests.pdf For interested readers, an introductory presentation to Text Mining can also be viewed here. -------------------------------------- God willing, next time I hope to see what discussions have been going on in the Philosophy group lately, how are the discussions in the Shia Sunni forum at SC different than those at HCY, what topics does the fairer sex focus on at SC and how, if at all, are these different from the topics of male members, or what are the characteristics of banned members, and can these be exploited to predict if a certain member is a banned member with a new account, or needs to be banned. Thank you for the time reading this. (wasalam)
(bismillah) (salam) Do we have any enthusiasts of R, the environment for Statistical Computing, here? I am currently working on a text mining task, using R (for the first time), and I was wondering if this site had some R programmers. (wasalam)
Recently Browsing 0 members
No registered users viewing this page.