Content-Sensitive Characterization of Peer Interactions of Highly Engaged Users in an Online Community for Smoking Cessation: Mixed-Methods Approach for Modeling User Engagement in Health Promotion Interventions

Background Online communities provide affordable venues for behavior change. However, active user engagement holds the key to the success of these platforms. In order to enhance user engagement and in turn, health outcomes, it is essential to offer targeted interventional and informational support. Objective In this paper, we describe a content plus frequency framework to enable the characterization of highly engaged users in online communities and study theoretical techniques employed by these users through analysis of exchanged communication. Methods We applied the proposed methodology for analysis of peer interactions within QuitNet, an online community for smoking cessation. Firstly, we identified 144 highly engaged users based on communication frequency within QuitNet over a period of 16 years. Secondly, we used the taxonomy of behavior change techniques, text analysis methods from distributional semantics, machine learning, and sentiment analysis to assign theory-driven labels to content. Finally, we extracted content-specific insights from peer interactions (n=159,483 messages) among highly engaged QuitNet users. Results Studying user engagement using our proposed framework led to the definition of 3 user categories—conversation initiators, conversation attractors, and frequent posters. Specific behavior change techniques employed by top tier users (threshold set at top 3) within these 3 user groups were found to be goal setting, social support, rewards and threat, and comparison of outcomes. Engagement-specific trends within sentiment manifestations were also identified. Conclusions Use of content-inclusive analytics has offered deep insight into specific behavior change techniques employed by highly engaged users within QuitNet. Implications for personalization and active user engagement are discussed.

Introduction health consumers [4]. Empowered and engaged health consumers (and patients) in conjunction with technological advances can create a more streamlined and efficient system from the current health sector, which is chaotic and fragmented [1]. To this end, social media and online communities (often referred to as the "participative internet" [5]) are gaining popularity as venues for management of health and wellness. Traditionally, health care services and clinicians act as intermediaries in providing health consumers and patients with relevant and vetted health information. However, with the penetration of consumer-driven Web, these new generation platforms have been gaining popularity as predominant information sources where social contacts, collaborative tools (eg, content recommendation systems) and agents (eg, virtual assistants) guide health consumers and patients toward adoption and maintenance of positive health behaviors [6]. Accessibility, scalability, and affordability are key characteristics that can make these platforms effective public health interventions for health promotion and behavior change [7]. In order to achieve the full potential of these platforms, we need to address challenges related to adoption and sustained use [8][9][10]. Understanding the facets of user engagement in an online health community can help us drive acceptance and enhance utility of online social platforms in health care. Strong user engagement with a technology intervention may enhance outcomes [8]. Health consumers use internet-based resources differently than they do in-person and group programs, and hence, the characteristics of subpopulations must be assessed to determine the factors leading to sustained use [11]. Further, such nuanced understanding of patterns of use can help us optimize care pathways and personalize design of technologies to achieve long-term engagement [12].
There are multiple definitions of user engagement in health care settings [13][14][15][16][17]. As defined by Lalmas [17], user engagement is "the quality of user experience that emphasizes the desire of a user to use the application longer and in frequent intervals." Traditionally, user engagement metrics are calculated by number of page views or time spent on specific activities [18]. In the context of a social platform, however, the volume of posts (also known as communication frequency) is the most used criterion to describe levels of user engagement in both short-term and long-term use. This quality of user experience can be described using a complex set of factors that are emotional, cognitive, and behavioral in nature [13]. Existing metrics do not capture this holistic view of user engagement with a technological resource such as an online social platform.
Previous studies on user engagement in online communities have focused on (1) enhancing user engagement through gamification and related techniques [19], (2) estimating the impact of personalization on engagement [20], (3) studying engagement from the perspective of the 3 main types of social support-informational support, emotional support, and companionship [21], (4) studying user engagement from the perspective of user roles such as lurkers [22,23] and user evolution across these roles [8], (5) estimating engagement through data logs [24], and (6) studying social bootstrapping and social curation as possible aspects of improved user engagement [25,26]. On the other hand, our prior research on online social platforms has highlighted the need to consider content of user communication to better characterize peer interactions [27][28][29][30]. The content itself can have a strong influence on user engagement [27]. Also, sustained engagement is likely when the intervention offers ongoing learning and interactive opportunities. The extent to which these requirements are fulfilled can be understood by studying the content of communication from a theoretical perspective [30]. However, the dynamics of content-based attributes and user engagement in online communities has rarely been studied together.
In our study, we aimed to address these gaps by developing new methods inclusive of communication frequency and communication content, thus enabling the study of user engagement in online health communities. The relationship between these communication attributes and user engagement is detailed in the literature [5,20,31,32]. The methodology described in the paper takes an integrated approach from various user engagement models, some of which are specific to online communities for health behavior change [31,32]. We apply this proposed new model to analyze peer interactions in QuitNet, an online community for smoking cessation.

Materials
QuitNet is one of the first online communities for behavior change with historically over 100,000 new registrants per year [33]. Previous studies show that participation in QuitNet is associated with abstinence [34]. In this paper, we examined a data set from a version of QuitNet that ran until approximately 2015 where the primary mode of communication was threaded forums; in other words, peer conversations were initiated with an initial message and replies were displayed hierarchically. Each forum message is identified using a message ID, a thread ID, a sender ID, and a recipient ID. Participants set quit dates which represent abstinence from smoking and are preserved in historical logs if they change. This research project was reviewed and exempted by the Institutional Review Board at the University of Texas Health Science Center at Houston.

Characterization of QuitNet Users
We defined 3 QuitNet user groups based on the frequency in which users engaged in peer interactions within QuitNet. Empirical cut points based on previous research were used to define the groupings as described below: For the purpose of this study, our analysis included the top 3 QuitNet users within each user group (that is, 9 users in 3 groups in a given year) across the years 2000 to 2015, resulting in 144 unique users. Overall, 26,466 messages were exchanged by conversation attractors, 57,379 messages were exchanged by conversation initiators, and 75,638 messages were exchanged by frequent posters. In total, these 144 highly engaged users exchanged 159,483 messages. Further, we analyzed the user demographics (age, gender) and self-reported smoking status where available in event logs. For users whose event logs were incomplete or unavailable (42.4% [61/144] of the users), we analyzed their messages manually, leveraging the fact that QuitNet users specify the number of days since they last smoked in every message as a form of tradition. From these data, we estimated user abstinence status across years and identified users as falling into one of the following categories: (1) abstinent for less than 3 months, (2) abstinent between 3 and 6 months, (3) abstinent between 6 months and 1 year, (4) abstinent between 1 and 2 years, (5) abstinent for more than 2 years, and (6) active smoker.

Characterization of QuitNet Communication Content
We analyzed 2.05 million messages generated by 102,005 unique users, exchanged between the years 2000 and 2015 for this purpose. Using a series of qualitative and automated text analysis methods, we categorized QuitNet messages into 16 themes.

Qualitative Analysis
We selected 2000 messages at random to produce an annotated subset of QuitNet messages ( Figure 1). These messages were manually coded using the taxonomy of behavior change techniques [35] by 2 independent coders. The taxonomy has 93 theoretically linked behavior change techniques clustered into 16 thematic categories that were developed by a team of behavior change experts and drawn from multiple behavior change theories such as social change theory and the health belief model. The list of techniques of the taxonomy with definitions and subcategories can be found in Michie et al [35]. Table 1 shows snippets of sample messages from QuitNet that correspond to a particular technique of the taxonomy. From the sample messages, it can be seen that a single message in QuitNet can be mapped to multiple techniques of the taxonomy. The interrater reliability was estimated at 0.76 for the 2 coders during annotation. Most disagreements were due to multiple labels assigned to each message. All coding conflicts were resolved through discussion before the manual annotation of QuitNet messages was finalized.

Automated Text Analysis
All messages in the dataset were annotated by providing the QuitNet vector representations to a machine learning classifier trained on the manually annotated messages. The components of the process are represented in Figure 2. To generate vector representations of messages, we used neural word embeddings, specifically the Skipgram-with-Negative-Sampling (SGNS) algorithm developed by Mikolov and colleagues [36], as implemented in the open source semantic vectors [37] package for distributional semantics. With SGNS, a neural network is trained to predict the probability of encountering terms that occur in proximity to an observed term. For example, one might anticipate a relatively high probability of observing the term "relaxing" in proximity to the term "hammock." During the course of training, the neural network learns to predict higher probabilities for observing contextual terms that are observed in proximity to a term in the corpus than it does for those that are not observed in proximity to this term. Each term is associated with a set of weights-a weight vector-that encodes the terms that have been observed in proximity to it. Terms that occur in similar contexts will have similar weight vectors, and the similarity between the resulting weight vectors has been shown to correlate well with human estimates of the similarity between concept pairs. An advantage of this approach is that it permits the incorporation of background knowledge into a categorization model. For example, if 2 QuitNet users refer in messages to a concept such as "sadness" with different terms (such as "feeling blue," "miserable today"), a classifier will be able to assign a category attached to one of these messages from the training set to the other previously unseen message because the words in these messages have similar representations, even though they are not identical words. Information of this sort can be learned by SGNS from a large unannotated general domain corpus.
For our research, Wikipedia was used as a background corpus. Our Wikipedia corpus contains 1.9 billion words in more than 4.4 million articles, and 500-dimensional Wikipedia-derived term vectors were obtained by applying the SGNS algorithm to the Wikipedia background corpus. This decision was motivated in part by the terse nature of the messages exchanged in QuitNet user forums, which often do not provide enough contextual information to train a distributional model [28]. However, there are ways in which the language used in QuitNet differs from that used in the Wikipedia. Of particular importance for the current research, QuitNet messages contain neologisms, community-specific terms such as "nicodemon" and "sickarettes." While these terms are unlikely to appear in Wikipedia, within QuitNet they occur in the context of other terms for which robust representations have been learned from this larger corpus. As such, the distributional information learned from Wikipedia can be viewed as a form of prior knowledge, providing a starting point for the next phase-distributional modeling of the QuitNet corpus. This component of the training process used an iterative training procedure [38] in which term and message vectors are generated in succession.
Specifically, we first superposed (added together) the Wikipedia term vectors for the terms that occur in each QuitNet message to obtain Wikipedia-based QuitNet message vectors. We then composed term vectors for the terms that occur in QuitNet by adding QuitNet message vectors for each message in which a given term in QuitNet occurred. As such, these term vectors encode distributional information from Wikipedia and from QuitNet-specific contextual use of terms. Finally, QuitNet message vectors were generated by superposing these term vectors. This procedure is illustrated in Figure 2. Wow!!! xyxx is correct. You control your attitude. Deep breathing, chew gum, take a walk (or maybe a hike:-) Hang in there. This is not easy, your an addict.

Repetition and substitution
At this point in your Quit it may be best to look at more immediate gains such as money saved or improved self esteem or better health. My dollar savings are $5,293 and that is real and for now. My life saved is an unrealized 15 weeks and 20 minutes that may never happen.

Comparison of outcomes
Congratulations on your Quit. In the end YOUWIN...IWIN...WEWIN! Rewards and threat If anxiety is an issue for you in general or you used smoking to alleviate it, using new coping tools and behaviors to help you relax and unwind will help this symptom fade quicker. The important thing is to keep going Regulation I found this site before I quit also, and it made an incredible difference in my ability to be successful. Come over to the QuitStop Forum on this site and you will meet many people with all different stories about what worked for them and what didn't.

Antecedents
Great insight, YYY. But have been suffering a lot of depression type symptoms ....never occurred to me could be related to the smoking cessation. That actually helps me see things a little differently.

Identity
Great News Also!!! After further Review, ZZZ it is now official! As of 6:00 AM you are on the Q Anni List so your 2 month quit is officially good now! The points will stay on the board! Keep up the great play calling! Hugs  The components of the vectors generated in this way were used as feature vectors for supervised machine learning that was conducted using the widely used Waikato Environment for Knowledge Analysis open source package for machine learning [39]. Each of the techniques of the behavior change taxonomy was used as a target for classification. Ten-fold cross-validation was applied using the naïve Bayes classifier to evaluate a binary classifier for each of the themes. Each of the trained and validated classification models was then used to classify the entire set of 2.05 million messages. Given the highly engaged users in this paper, we focused on the manifestation of behavior change techniques within the messages exchanged by 144 users identified in the earlier step, thus limiting our analysis to 159,483 messages exchanged by these users.

Sentiment analysis of user communication in QuitNet was
performed using the open source software R (The R Foundation) on 159,483 messages exchanged by highly engaged users. Multiple packages were evaluated by comparing their output to manual annotation provided by 2 independent coders. Based on reliability measures, we chose the best performing package to ensure the suitability of the classification to QuitNet interactions. The SentimentAnalysis package [40] in R yielded the best results at 0.81 (average system-rater agreement measured using Cohen kappa) reaching good interrater agreement. This led to classification of messages by 144 users into the categories positive, negative, and neutral using the function convertToDirection(sentiment$SentimentQDAP). The inbuilt dictionaries within the R package were used to assign a particular sentiment class to the QuitNet messages.

Characterization of QuitNet Users
The average age of the 144 QuitNet users considered in this analysis was 49 (SD 9.4) years with 82.6% (119/144) female users. Among conversation attractors, 69% (33/48) were female users with an average age of 48 (SD 10.1) years. The remaining 31% (16/48) male users had an average age of 42 (SD 6.6) years. Of the conversation initiators, 85% (41/48) were female users with an average age of 52 (SD 9.4) years. The remaining 15% (7/48) were male users with an average age of 46 (SD 6.8) years. Of the frequent posters, 94% (45/48) were female users with an average age of 49 (SD 8.6) years. The remaining 6% (3/48) were male with an average age of 51 (SD 10.2) years. Figure 3 provides the variations in smoking status across these user groups. Among conversation initiators, users who had been abstinent for more than 2 years accounted for 58% (28/48). The highest proportion of users among conversation attractors was individuals who were within 3 months of a quit attempt, with close to 42% (20/48) of the users in this category. Among frequent posters, QuitNet users with varying lengths of quit attempts had equal representation.

Characterization of QuitNet Communication Content
Due to insufficient positive examples in the training set, we disregarded 8 of the 16 techniques of the taxonomy for final classification. For the remaining 8 techniques, the precision, recall, and f-measure for the cross-validation of the automated classification method using the naïve Bayes classifier were 0.80, 0.70, and 0.71, respectively. The themes considered for further analysis were goals and planning, feedback and monitoring, social support, natural consequences, comparison of behavior, comparison of outcomes, rewards and threat, and self-belief.

Behavior Change Techniques
The average percentages of messages across different QuitNet user groups are shown in Figure 4. As seen in the figure, we observed manifestation of goals and planning and comparison of outcomes in 72.00% (19,056/26,466) and 41.00% (10,851/26,466), respectively, of the messages exchanged by conversation attractors. Among conversation initiators, the most exchanged techniques were rewards and threat and social support, in 63.00% (36,149/57,379) and 48.00% (27,542/57,379) of the messages, respectively. Among frequent posters, feedback and monitoring followed by goals and planning were embedded in 74.00% (55,972/75,638) and 49.00% (37,062/75,638) of the messages, respectively.

Sentiments
Considering messages exchanged by conversation attractors (see Figure 5), 45.00% (11,910/26,466) of messages were found to be containing positive sentiments, while messages with negative sentiments accounted for 40.00% (10,586/26,466). Conversation initiators' posts were more positive, with 50.00% (28,690/57,379) of their messages classified as positive and about 23.00% (13,197/57,379) classified as negative. The frequent posters' discussions were also positive in nature, with 78.00% (58,998/75,638) of the messages classified as containing positive sentiments.

Principal Findings
In this study, we developed a novel content plus frequency framework to identify highly engaged users across 3 social categories and identify each category's most common behavior change strategies. These methods and findings can help interventionists, technology developers, and health professionals (1) understand the techniques frequently used by highly engaged users in an online health community; (2) model, implement, and evaluate these known techniques to improve user engagement in other modalities and intervention programs; and (3) develop population health management strategies by considering the observed communication characteristics and behavioral attributes of highly engaged users.
Conversation attractors were predominantly recently quit users who appear to be facing difficulties after quitting and were seeking support from other users. They tend to discuss goals and planning and comparison of outcomes with tendencies toward negative sentiment, consistent with the communication of difficulties in the acute cessation process. Conversely, conversation initiators on QuitNet were veteran users who had been abstinent for more than 2 years. Their long duration in the community, abstinence, and high percentage of messages covering social support and rewards and threat (with a positive sentiment trend) suggests that they serve as community elders, motivating and supporting the quit attempts of newer users. Frequent posters were overwhelmingly female (96%) with no specific trend in their smoking status. The majority of their messages were positive (78%) and tended to refer to popular traditions [27] within QuitNet (such as virtual bonfire events or pledging to not smoke). In QuitNet, these techniques manifest in the form of users sharing positive and negative consequences they have encountered after quitting and pledges to not smoke for the day. Internal norms, traditions, and celebrations form a central core of engagement within QuitNet [27] and other online behavior change platforms that emphasize social support. These results suggest that identification and engagement of conversation attractors in particular may allow intervention designers to modify and enhance or even create new norms and traditions for dissemination through a community.

Limitations
There are a number of limitations to this study. Given our primary motivation was the development of new methods, we arbitrarily set thresholds for the top engaged users. Expansion of thresholds could change the characteristics of the findings. The manual coding process of behavior change techniques was limited to 16 high-level categories, which may not fully cover the array of techniques present in QuitNet or other programs. The automated text analysis methods can be improved using advanced word representation techniques (eg, convolutional neural networks [41,42]). Finally, we used off-the-shelf sentiment analysis tools, which can be improved with n-grams, corpus-specific advanced machine learning classifiers, or specialized dictionaries [43,44].

Conclusions
Sustained engagement with social support and effective behavior change programs remains the core principal of most online platforms that seek participation of patients in their own care. Support from peers, community-driven rituals, guidance from veteran users, and a safe and secure environment for open communication are a few motivational factors that have been hypothesized to drive engagement in QuitNet and are confirmed in this analysis. Insights from this study and future work using the same or similar framework may allow the implementation of targeted recommendation engines [45][46][47] to promote meaningful affiliations with content (such as information about nicotine substitutes) and connections (such as connecting users who have quit recently to veteran users) personalized based on user characteristics (eg, age, gender, smoking status). Ultimately, achieving the promise of patient participation in online communities requires the platforms themselves to evolve in response to individual and network changes and preferences that develop over time. This study offers a potential framework to drive such observation, evaluation, and ultimate change to better serve users and close the loop between intervention providers and their participants.