Subscribe to the PwC Newsletter

Join the community, add a new evaluation result row, sentiment analysis.

1295 papers with code • 39 benchmarks • 93 datasets

Sentiment Analysis is the task of classifying the polarity of a given text. For instance, a text-based tweet can be categorized into either "positive", "negative", or "neutral". Given the text and accompanying labels, a model can be trained to predict the correct sentiment.

Sentiment Analysis techniques can be categorized into machine learning approaches, lexicon-based approaches, and even hybrid methods. Some subcategories of research in sentiment analysis include: multimodal sentiment analysis, aspect-based sentiment analysis, fine-grained opinion analysis, language specific sentiment analysis.

More recently, deep learning techniques, such as RoBERTa and T5, are used to train high-performing sentiment classifiers that are evaluated using metrics like F1, recall, and precision. To evaluate sentiment analysis systems, benchmark datasets like SST, GLUE, and IMDB movie reviews are used.

Further readings:

  • Sentiment Analysis Based on Deep Learning: A Comparative Study

research paper on sentiment analysis

Benchmarks Add a Result

research paper on sentiment analysis

Most implemented papers

Bert: pre-training of deep bidirectional transformers for language understanding.

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.

Convolutional Neural Networks for Sentence Classification

research paper on sentiment analysis

We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks.

Universal Language Model Fine-tuning for Text Classification

research paper on sentiment analysis

Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch.

Bag of Tricks for Efficient Text Classification

facebookresearch/fastText • EACL 2017

This paper explores a simple and efficient baseline for text classification.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

A Structured Self-attentive Sentence Embedding

This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP).

Deep contextualized word representations

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e. g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i. e., to model polysemy).

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training.

Domain-Adversarial Training of Neural Networks

Our approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test (target) domains.

  • Methodology
  • Open access
  • Published: 16 June 2015

Sentiment analysis using product review data

  • Xing Fang 1 &
  • Justin Zhan 1  

Journal of Big Data volume  2 , Article number:  5 ( 2015 ) Cite this article

183k Accesses

407 Citations

14 Altmetric

Metrics details

Sentiment analysis or opinion mining is one of the major tasks of NLP (Natural Language Processing). Sentiment analysis has gain much attention in recent years. In this paper, we aim to tackle the problem of sentiment polarity categorization, which is one of the fundamental problems of sentiment analysis. A general process for sentiment polarity categorization is proposed with detailed process descriptions. Data used in this study are online product reviews collected from Amazon.com. Experiments for both sentence-level categorization and review-level categorization are performed with promising outcomes. At last, we also give insight into our future work on sentiment analysis.

Introduction

Sentiment is an attitude, thought, or judgment prompted by feeling. Sentiment analysis [ 1 - 8 ], which is also known as opinion mining, studies people’s sentiments towards certain entities. Internet is a resourceful place with respect to sentiment information. From a user’s perspective, people are able to post their own content through various social media, such as forums, micro-blogs, or online social networking sites. From a researcher’s perspective, many social media sites release their application programming interfaces (APIs), prompting data collection and analysis by researchers and developers. For instance, Twitter currently has three different versions of APIs available [ 9 ], namely the REST API, the Search API, and the Streaming API. With the REST API, developers are able to gather status data and user information; the Search API allows developers to query specific Twitter content, whereas the Streaming API is able to collect Twitter content in realtime. Moreover, developers can mix those APIs to create their own applications. Hence, sentiment analysis seems having a strong fundament with the support of massive online data.

However, those types of online data have several flaws that potentially hinder the process of sentiment analysis. The first flaw is that since people can freely post their own content, the quality of their opinions cannot be guaranteed. For example, instead of sharing topic-related opinions, online spammers post spam on forums. Some spam are meaningless at all, while others have irrelevant opinions also known as fake opinions [ 10 - 12 ]. The second flaw is that ground truth of such online data is not always available. A ground truth is more like a tag of a certain opinion, indicating whether the opinion is positive, negative, or neutral. The Stanford Sentiment 140 Tweet Corpus [ 13 ] is one of the datasets that has ground truth and is also public available. The corpus contains 1.6 million machine-tagged Twitter messages. Each message is tagged based on the emoticons (☺as positive, ☹ as negative) discovered inside the message.

Data used in this paper is a set of product reviews collected from Amazon [ 14 ], between February and April, 2014. The aforementioned flaws have been somewhat overcome in the following two ways: First, each product review receives inspections before it can be posted a . Second, each review must have a rating on it that can be used as the ground truth. The rating is based on a star-scaled system, where the highest rating has 5 stars and the lowest rating has only 1 star (Figure 1 ).

Rating System for Amazon.com.

This paper tackles a fundamental problem of sentiment analysis, namely sentiment polarity categorization [ 15 - 21 ]. Figure 2 is a flowchart that depicts our proposed process for categorization as well as the outline of this paper. Our contributions mainly fall into Phase 2 and 3. In Phase 2: 1) An algorithm is proposed and implemented for negation phrases identification; 2) A mathematical approach is proposed for sentiment score computation; 3) A feature vector generation method is presented for sentiment polarity categorization. In Phase 3: 1) Two sentiment polarity categorization experiments are respectively performed based on sentence level and review level; 2) Performance of three classification models are evaluated and compared based on their experimental results.

Sentiment Polarity Categorization Process.

The rest of this paper is organized as follows: In section ‘ Background and literature review ’, we provide a brief review towards some related work on sentiment analysis. Software package and classification models used in this study are presented in section ‘ Methods ’. Our detailed approaches for sentiment analysis are proposed in section ‘ Background and literature review ’. Experimental results are presented in section ‘ Results and discussion ’. Discussion and future work is presented in section ‘ Review-level categorization ’. Section ‘ Conclusion ’ concludes the paper.

Background and literature review

One fundamental problem in sentiment analysis is categorization of sentiment polarity [ 6 , 22 - 25 ]. Given a piece of written text, the problem is to categorize the text into one specific sentiment polarity, positive or negative (or neutral). Based on the scope of the text, there are three levels of sentiment polarity categorization, namely the document level, the sentence level, and the entity and aspect level [ 26 ]. The document level concerns whether a document, as a whole, expresses negative or positive sentiment, while the sentence level deals with each sentence’s sentiment categorization; The entity and aspect level then targets on what exactly people like or dislike from their opinions.

Since reviews of much work on sentiment analysis have already been included in [ 26 ], in this section, we will only review some previous work, upon which our research is essentially based. Hu and Liu [ 27 ] summarized a list of positive words and a list of negative words, respectively, based on customer reviews. The positive list contains 2006 words and the negative list has 4783 words. Both lists also include some misspelled words that are frequently present in social media content. Sentiment categorization is essentially a classification problem, where features that contain opinions or sentiment information should be identified before the classification. For feature selection, Pang and Lee [ 5 ] suggested to remove objective sentences by extracting subjective ones. They proposed a text-categorization technique that is able to identify subjective content using minimum cut. Gann et al. [ 28 ] selected 6,799 tokens based on Twitter data, where each token is assigned a sentiment score, namely TSI(Total Sentiment Index), featuring itself as a positive token or a negative token. Specifically, a TSI for a certain token is computed as:

where p is the number of times a token appears in positive tweets and n is the number of times a token appears in negative tweets. \(\frac {tp}{tn}\) is the ratio of total number of positive tweets over total number of negative tweets.

Research design and methdology

Data collection.

Data used in this paper is a set of product reviews collected from amazon.com. From February to April 2014, we collected, in total, over 5.1 millions of product reviews b in which the products belong to 4 major categories: beauty, book, electronic, and home (Figure 3 (a)). Those online reviews were posted by over 3.2 millions of reviewers (customers) towards 20,062 products. Each review includes the following information: 1) reviewer ID; 2) product ID; 3) rating; 4) time of the review; 5) helpfulness; 6) review text. Every rating is based on a 5-star scale(Figure 3 (b)), resulting all the ratings to be ranged from 1-star to 5-star with no existence of a half-star or a quarter-star.

Data collection (a) Data based on product categories (b) Data based on review categories.

Sentiment sentences extraction and POS tagging

It is suggested by Pang and Lee [ 5 ] that all objective content should be removed for sentiment analysis. Instead of removing objective content, in our study, all subjective content was extracted for future analysis. The subjective content consists of all sentiment sentences. A sentiment sentence is the one that contains, at least, one positive or negative word. All of the sentences were firstly tokenized into separated English words.

Every word of a sentence has its syntactic role that defines how the word is used. The syntactic roles are also known as the parts of speech. There are 8 parts of speech in English: the verb, the noun, the pronoun, the adjective, the adverb, the preposition, the conjunction, and the interjection. In natural language processing, part-of-speech (POS) taggers [ 29 - 31 ] have been developed to classify words based on their parts of speech. For sentiment analysis, a POS tagger is very useful because of the following two reasons: 1) Words like nouns and pronouns usually do not contain any sentiment. It is able to filter out such words with the help of a POS tagger; 2) A POS tagger can also be used to distinguish words that can be used in different parts of speech. For instance, as a verb, “enhanced" may conduct different amount of sentiment as being of an adjective. The POS tagger used for this research is a max-entropy POS tagger developed for the Penn Treebank Project [ 31 ]. The tagger is able to provide 46 different tags indicating that it can identify more detailed syntactic roles than only 8. As an example, Table 1 is a list of all tags for verbs that has been included in the POS tagger.

Each sentence was then tagged using the POS tagger. Given the enormous amount of sentences, a Python program that is able to run in parallel was written in order to improve the speed of tagging. As a result, there are over 25 million adjectives, over 22 million adverbs, and over 56 million verbs tagged out of all the sentiment sentences, because adjectives, adverbs, and verbs are words that mainly convey sentiment.

Negation phrases identification

Words such as adjectives and verbs are able to convey opposite sentiment with the help of negative prefixes. For instance, consider the following sentence that was found in an electronic device’s review: “The built in speaker also has its uses but so far nothing revolutionary." The word, “revolutionary" is a positive word according to the list in [ 27 ]. However, the phrase “nothing revolutionary" gives more or less negative feelings. Therefore, it is crucial to identify such phrases. In this work, there are two types of phrases have been identified, namely negation-of-adjective (NOA) and negation-of-verb (NOV).

Most common negative prefixes such as not, no, or nothing are treated as adverbs by the POS tagger. Hence, we propose Algorithm 1 for the phrases identification. The algorithm was able to identify 21,586 different phrases with total occurrence of over 0.68 million, each of which has a negative prefix. Table 2 lists top 5 NOA and NOV phrases based on occurrence, respectively.

Sentiment score computation for sentiment tokens

A sentiment token is a word or a phrase that conveys sentiment. Given those sentiment words proposed in [ 27 ], a word token consists of a positive (negative) word and its part-of-speech tag. In total, we selected 11,478 word tokens with each of them that occurs at least 30 times throughout the dataset. For phrase tokens, 3,023 phrases were selected of the 21,586 identified sentiment phrases, which each of the 3,023 phrases also has an occurrence that is no less than 30. Given a token t , the formula for t ’s sentiment score (SS) computation is given as:

O c c u r r e n c e i ( t ) is t ’s number of occurrence in i -star reviews, where i =1,...,5. According to Figure 3 , our dataset is not balanced indicating that different number of reviews were collected for each star level. Since 5-star reviews take a majority amount through the entire dataset, we hereby introduce a ratio, γ 5, i , which is defined as:

In equation 3 , the numerator is the number of 5-star reviews and the denominator is the number of i -star reviews, where i =1,...,5. Therefore, if the dataset were balanced, γ 5, i would be set to 1 for every i . Consequently, every sentiment score should fall into the interval of [1,5]. For positive word tokens, we expect that the median of their sentiment scores should exceed 3, which is the point of being neutral according to Figure 1 . For negative word tokens, it is to expect that the median should be less than 3.

As a result, the sentiment score information for positive word tokens is showing in Figure 4 (a). The histogram chart describes the distribution of scores while the box-plot chart shows that the median is above 3. Similarly, the box-plot chart in Figure 4 (b) shows that the median of sentiment scores for negative word tokens is lower than 3. In fact, both the mean and the median of positive word tokens do exceed 3, and both values are lower than 3, for negative word tokens (Table 3 ).

Sentiment score information for word tokens (a) Positive word tokens (b) Negative word tokens.

The ground truth labels

The process of sentiment polarity categorization is twofold: sentence-level categorization and review-level categorization. Given a sentence, the goal of sentence-level categorization is to classify it as positive or negative in terms of the sentiment that it conveys. Training data for this categorization process require ground truth tags, indicating the positiveness or negativeness of a given sentence. However, ground truth tagging becomes a really challenging problem, due to the amount of data that we have. Since manually tagging each sentence is infeasible, a machine tagging approach is then adopted as a solution. The approach implements a bag-of-word model that simply counts the appearance of positive or negative (word) tokens for every sentence. If there are more positive tokens than negative ones, the sentence will be tagged as positive, and vice versa. This approach is similar to the one used for tagging the Sentiment 140 Tweet Corpus. Training data for review-level categorization already have ground truth tags, which are the star-scaled ratings.

Feature vector formation

Sentiment tokens and sentiment scores are information extracted from the original dataset. They are also known as features, which will be used for sentiment categorization. In order to train the classifiers, each entry of training data needs to be transformed to a vector that contains those features, namely a feature vector. For the sentence-level (review-level) categorization, a feature vector is formed based on a sentence (review). One challenge is to control each vector’s dimensionality. The challenge is actually twofold: Firstly, a vector should not contain an abundant amount (thousands or hundreds) of features or values of a feature, because of the curse of dimensionality [ 32 ]; secondly, every vector should have the same number of dimensions, in order to fit the classifiers. This challenge particularly applies to sentiment tokens: On one hand, there are 11,478 word tokens as well as 3,023 phrase tokens; On the other hand, vectors cannot be formed by simply including the tokens appeared in a sentence (or a review), because different sentences (or reviews) tend to have different amount of tokens, leading to the consequence that the generated vectors are in different dimensions.

Since we only concern each sentiment token’s appearance inside a sentence or a review,to overcome the challenge, two binary strings are used to represent each token’s appearance. One string with 11,478 bits is used for word tokens, while the other one with a bit-length of 3,023 is applied for phrase tokens. For instance, if the i th word (phrase) token appears, the word (phrase) string’s i th bit will be flipped from “0" to “1". Finally, instead of directly saving the flipped strings into a feature vector, a hash value of each string is computed using Python’s built-in hash function and is saved. Hence, a sentence-level feature vector totally has four elements: two hash values computed based on the flipped binary strings, an averaged sentiment score, and a ground truth label. Comparatively, one more element is exclusively included in review-level vectors. Given a review, if there are m positive sentences and n negative sentences, the value of the element is computed as: −1× m +1× n .

Results and discussion

Evaluation methods.

Performance of each classification model is estimated base on its averaged F1-score ( 4 ):

where P i is the precision of the i th class, R i is the recall of the i th class, and n is the number of classes. P i and R i are evaluated using 10-fold cross validation. A 10-fold cross validation is applied as follows: A dataset is partitioned into 10 equal size subsets, each of which consists of 10 positive class vectors and 10 negative class vectors. Of the 10 subsets, a single subset is retained as the validation data for testing the classification model, and the remaining 9 subsets are used as training data. The cross-validation process is then repeated 10 times, with each of the 10 subsets used exactly once as the validation data. The 10 results from the folds are then averaged to produce a single estimation. Since training data are labeled under two classes (positive and negative) for the sentence-level categorization, ROC (Receiver Operating Characteristic) curves are also plotted for a better performance comparison.

Sentence-level categorization

Result on manually-labeled sentences.

200 feature vectors are formed based on the 200 manually-labeled sentences. As a result, the classification models show the same level of performance based on their F1-scores, where the three scores all take a same value of 0.85. With the help of the ROC curves (Figure 5 ), it is clear to see that all three models performed quite well for testing data that have high posterior probability. (A posterior probability of a testing data point, A , is estimated by the classification model as the probability that A will be classified as positive, denoted as P (+| A ).) As the probability getting lower, the Naïve Bayesain classifier outperforms the SVM classifier, with a larger area under curve. In general, the Random Forest model performs the best.

ROC curves based on the manually labeled set.

Result on machine-labeled sentences

2-million feature vectors (1 million with positive labels and 1 million with negative labels) are generated from 2-million machine-labeled sentences, known as the complete set. Four subsets are obtained from the complete set, with subset A contains 200 vectors, subset B contains 2,000 vectors, subset C contains 20,000 vectors, and subset D contains 200,000 vectors, respectively. The amount of vectors with positive labels equals the amount of vectors with negative labels for every subset. Performance of the classification models is then evaluated based on five different vector sets (four subsets and one complete set, Figure 6 ).

F1 scores of sentence-level categorization.

While the models are getting more training data, their F1 scores are all increasing. The SVM model takes the most significant enhancement from 0.61 to 0.94 as its training data increased from 180 to 1.8 million. The model outperforms the Naïve Bayesain model and becomes the 2nd best classifier, on subset C and the full set. The Random Forest model again performs the best for datasets on all scopes. Figure 7 shows the ROC curves plotted based on the result of the full set.

ROC curves based on the complete set.

Review-level categorization

3-million feature vectors are formed for the categorization. Vectors generated from reviews that have at least 4-star ratings are labeled as positive, while vectors labeled as negative are generated from 1-star and 2-star reviews. 3-star reviews are used to prepare neutral class vectors. As a result, this complete set of vectors are uniformly labeled into three classes, positive, neutral, and negative. In addition, three subsets are obtained from the complete set, with subset A contains 300 vectors, subset B contains 3,000 vectors, subset C contains 30,000 vectors, and subset D contains 300,000 vectors, respectively.

Figure 8 shows the F1 scores obtained on different sizes of vector sets. It can be clearly observed that both the SVM model and the Naïve Bayesain model are identical in terms of their performances. Both models are generally superior than the Random Forest model on all vector sets. However, neither of the models can reach the same level of performance when they are used for sentence-level categorization, due to their relative low performances on neutral class.

F1 scores of review-level categorization.

The experimental result is promising, both in terms of the sentence-level categorization and the review-level categorization. It was observed that the averaged sentiment score is a strong feature by itself, since it is able to achieve an F1 score over 0.8 for the sentence-level categorization with the complete set. For the review-level categorization with the complete set, the feature is capable of producing an F1 score that is over 0.73. However, there are still couple of limitations to this study. The first one is that the review-level categorization becomes difficult if we want to classify reviews to their specific star-scaled ratings. In other words, F1 scores obtained from such experiments are fairly low, with values lower than 0.5. The second limitation is that since our sentiment analysis scheme proposed in this study relies on the occurrence of sentiment tokens, the scheme may not work well for those reviews that purely contain implicit sentiments. An implicit sentiment is usually conveyed through some neutral words, making judgement of its sentiment polarity difficult. For example, sentence like “Item as described.", which frequently appears in positive reviews, consists of only neutral words.

With those limitations in mind, our future work is to focus on solving those issues. Specifically, more features will be extracted and grouped into feature vectors to improve review-level categorizations. For the issue of implicit sentiment analysis, our next step is to be able to detect the existence of such sentiment within the scope of a particular product. More future work includes testing our categorization scheme using other datasets.

Sentiment analysis or opinion mining is a field of study that analyzes people’s sentiments, attitudes, or emotions towards certain entities. This paper tackles a fundamental problem of sentiment analysis, sentiment polarity categorization. Online product reviews from Amazon.com are selected as data used for this study. A sentiment polarity categorization process (Figure 2 ) has been proposed along with detailed descriptions of each step. Experiments for both sentence-level categorization and review-level categorization have been performed.

Software used for this study is scikit-learn [ 33 ], an open source machine learning software package in Python. The classification models selected for categorization are: Naïve Bayesian, Random Forest, and Support Vector Machine [ 32 ].

Naïve Bayesian classifier

The Naïve Bayesian classifier works as follows: Suppose that there exist a set of training data, D , in which each tuple is represented by an n -dimensional feature vector, X = x 1 , x 2 ,.., x n , indicating n measurements made on the tuple from n attributes or features. Assume that there are m classes, C 1 , C 2 ,..., C m . Given a tuple X , the classifier will predict that X belongs to C i if and only if: P ( C i | X )> P ( C j | X ), where i , j ∈ [1, m ] a n d i ≠ j . P ( C i | X ) is computed as:

Random forest

The random forest classifier was chosen due to its superior performance over a single decision tree with respect to accuracy. It is essentially an ensemble method based on bagging. The classifier works as follows: Given D , the classifier firstly creates k bootstrap samples of D , with each of the samples denoting as D i . A D i has the same number of tuples as D that are sampled with replacement from D . By sampling with replacement, it means that some of the original tuples of D may not be included in D i , whereas others may occur more than once. The classifier then constructs a decision tree based on each D i . As a result, a “forest" that consists of k decision trees is formed. To classify an unknown tuple, X , each tree returns its class prediction counting as one vote. The final decision of X ’s class is assigned to the one that has the most votes.

The decision tree algorithm implemented in scikit-learn is CART (Classification and Regression Trees). CART uses Gini index for its tree induction. For D , the Gini index is computed as:

where p i is the probability that a tuple in D belongs to class C i . The Gini index measures the impurity of D . The lower the index value is, the better D was partitioned. For the detailed descriptions of CART, please see [ 32 ].

Support vector machine

Support vector machine (SVM) is a method for the classification of both linear and nonlinear data. If the data is linearly separable, the SVM searches for the linear optimal separating hyperplane (the linear kernel), which is a decision boundary that separates data of one class from another. Mathematically, a separating hyperplane can be written as: W · X + b =0, where W is a weight vector and W = w 1 , w 2,..., w n . X is a training tuple. b is a scalar. In order to optimize the hyperplane, the problem essentially transforms to the minimization of ∥ W ∥ , which is eventually computed as: \(\sum \limits _{i=1}^{n} \alpha _{i} y_{i} x_{i}\) , where α i are numeric parameters, and y i are labels based on support vectors, X i . That is: if y i =1 then \(\sum \limits _{i=1}^{n} w_{i}x_{i} \geq 1\) ; if y i =−1 then \(\sum \limits _{i=1}^{n} w_{i}x_{i} \geq -1\) .

If the data is linearly inseparable, the SVM uses nonlinear mapping to transform the data into a higher dimension. It then solve the problem by finding a linear hyperplane. Functions to perform such transformations are called kernel functions. The kernel function selected for our experiment is the Gaussian Radial Basis Function (RBF):

where X i are support vectors, X j are testing tuples, and γ is a free parameter that uses the default value from scikit-learn in our experiment. Figure 9 shows a classification example of SVM based on the linear kernel and the RBF kernel.

A Classification Example of SVM.

a Even though there are papers talking about spam on Amazon.com, we still contend that it is a relatively spam-free website in terms of reviews because of the enforcement of its review inspection process.

b The product review data used for this work can be downloaded at: http://www.itk.ilstu.edu/faculty/xfang13/amazon_data.htm .

Kim S-M, Hovy E (2004) Determining the sentiment of opinions In: Proceedings of the 20th international conference on Computational Linguistics, page 1367.. Association for Computational Linguistics, Stroudsburg, PA, USA.

Google Scholar  

Liu B (2010) Sentiment analysis and subjectivity In: Handbook of Natural Language Processing, Second Edition.. Taylor and Francis Group, Boca.

Liu B, Hu M, Cheng J (2005) Opinion observer: Analyzing and comparing opinions on the web In: Proceedings of the 14th International Conference on World Wide Web, WWW ’05, 342–351.. ACM, New York, NY, USA.

Chapter   Google Scholar  

Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining In: Proceedings of the Seventh conference on International Language Resources and Evaluation.. European Languages Resources Association, Valletta, Malta.

Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts In: Proceedings of the 42Nd Annual Meeting on Association for Computational Linguistics, ACL ’04.. Association for Computational Linguistics, Stroudsburg, PA, USA.

Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr2(1-2): 1–135.

Article   Google Scholar  

Turney PD (2002) Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, 417–424.. Association for Computational Linguistics, Stroudsburg, PA, USA.

Whitelaw C, Garg N, Argamon S (2005) Using appraisal groups for sentiment analysis In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, CIKM ’05, 625–631.. ACM, New York, NY, USA.

Twitter (2014) Twitter apis. https://dev.twitter.com/start .

Liu B (2014) The science of detecting fake reviews. http://content26.com/blog/bing-liu-the-science-of-detecting-fake-reviews/ .

Jindal N, Liu B (2008) Opinion spam and analysis In: Proceedings of the 2008 International Conference on, Web Search and Data Mining, WSDM ’08, 219–230.. ACM, New York, NY, USA.

Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews In: Proceedings of the 21st, International Conference on World Wide Web, WWW ’12, 191–200.. ACM, New York, NY, USA.

Stanford (2014) Sentiment 140. http://www.sentiment140.com/ .

www.amazon.com.

Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision, 1–12.. CS224N Project Report, Stanford.

Lin Y, Zhang J, Wang X, Zhou A (2012) An information theoretic approach to sentiment polarity classification In: Proceedings of the 2Nd Joint WICOW/AIRWeb Workshop on Web Quality, WebQuality ’12, 35–40.. ACM, New York, NY, USA.

Sarvabhotla K, Pingali P, Varma V (2011) Sentiment classification: a lexical similarity based approach for extracting subjectivity in documents. Inf Retrieval14(3): 337–353.

Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis In: Proceedings of the conference on human language technology and empirical methods in natural language processing, 347–354.. Association for Computational Linguistics, Stroudsburg, PA, USA.

Yu H, Hatzivassiloglou V (2003) Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences In: Proceedings of the 2003 conference on, Empirical methods in natural language processing, 129–136.. Association for Computational Linguistics, Stroudsburg, PA, USA.

Zhang Y, Xiang X, Yin C, Shang L (2013) Parallel sentiment polarity classification method with substring feature reduction In: Trends and Applications in Knowledge Discovery and Data Mining, volume 7867 of Lecture Notes in Computer Science, 121–132.. Springer Berlin Heidelberg, Heidelberg, Germany.

Zhou S, Chen Q, Wang X (2013) Active deep learning method for semi-supervised sentiment classification. Neurocomputing120(0): 536–546. Image Feature Detection and Description.

Chesley P, Vincent B, Xu L, Srihari RK (2006) Using verbs and adjectives to automatically classify blog sentiment. Training580(263): 233.

Choi Y, Cardie C (2009) Adapting a polarity lexicon using integer linear programming for domain-specific sentiment classification In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2, EMNLP ’09, 590–598.. Association for Computational Linguistics, Stroudsburg, PA, USA.

Jiang L, Yu M, Zhou M, Liu X, Zhao T (2011) Target-dependent twitter sentiment classification In: Proceedings of the 49th, Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, 151–160.. Association for Computational Linguistics, Stroudsburg, PA, USA.

Tan LK-W, Na J-C, Theng Y-L, Chang K (2011) Sentence-level sentiment polarity classification using a linguistic approach In: Digital Libraries: For Cultural Heritage, Knowledge Dissemination, and Future Creation, 77–87.. Springer, Heidelberg, Germany.

Liu B (2012) Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers.

Hu M, Liu B (2004) Mining and summarizing customer reviews In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 168–177.. ACM, New York, NY, USA.

Gann W-JK, Day J, Zhou S (2014) Twitter analytics for insider trading fraud detection system In: Proceedings of the sencond ASE international conference on Big Data.. ASE.

Roth D, Zelenko D (1998) Part of speech tagging using a network of linear separators In: Coling-Acl, The 17th International Conference on Computational Linguistics, 1136–1142.

Kristina T (2003) Stanford log-linear part-of-speech tagger. http://nlp.stanford.edu/software/tagger.shtml .

Marcus M (1996) Upenn part of speech tagger. http://www.cis.upenn.edu/~treebank/home.html .

Han J, Kamber M, Pei J (2006) Data Mining: Concepts and Techniques, Second Edition (The Morgan Kaufmann Series in Data Management Systems), 2nd ed.. Morgan Kaufmann, San Francisco, CA, USA.

(2014) Scikit-learn. http://scikit-learn.org/stable/ .

Download references

Acknowledgements

This research was partially supported by the following grants: NSF No. 1137443, NSF No. 1247663, NSF No. 1238767, DoD No. W911NF-13-0130, DoD No. W911NF-14-1-0119, and the Data Science Fellowship Award by the National Consortium for Data Science.

Author information

Authors and affiliations.

Department of Computer Science, North Carolina A&T State University, Greensboro, NC, USA

Xing Fang & Justin Zhan

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Xing Fang .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors’ contributions

XF performed the primary literature review, data collection, experiments, and also drafted the manuscript. JZ worked with XF to develop the articles framework and focus. All authors read and approved the final manuscript.

Authors’ information

Xing Fang is a Ph.D. candidate at the Department of Computer Science, North Carolina A&T State University. His research interests include social computing, machine learning, and natural language processing. Mr. Fang holds one Master’s degree in computer science from North Carolina A&T State University, and one Baccalaureate degree in electronic engineering from Northwestern Polytechnical University, Xi’an, China.

Dr. Justin Zhan is an associate professor at the Department of Computer Science, North Carolina A&T State University. He has previously been a faculty member at Carnegie Mellon University and National Center for the Protection of Financial Infrastructure in Dakota State University. His research interests include Big Data, Information Assurance, Social Computing, and Health Science.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0 ), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Fang, X., Zhan, J. Sentiment analysis using product review data. Journal of Big Data 2 , 5 (2015). https://doi.org/10.1186/s40537-015-0015-2

Download citation

Received : 12 January 2015

Accepted : 20 April 2015

Published : 16 June 2015

DOI : https://doi.org/10.1186/s40537-015-0015-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Sentiment analysis; Sentiment polarity categorization; Natural language processing; Product reviews

research paper on sentiment analysis

A Survey on Sentiment Analysis

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Source: Emmanuel Saez and Gabriel Zucman • Note: Tax rates shown include levies paid at all levels of government. Government transfers such as Social Security benefits have not been subtracted.

In the 1960s, the 400 richest Americans paid more than half of their income in taxes. Higher tax rates for the wealthy kept inequality in check and helped fund the creation of social safety nets like Medicare, Medicaid and food stamps.

Today, the superrich control a greater share of America’s wealth than during the Gilded Age of Carnegies and Rockefellers. That's partly because taxes on the wealthy have cratered. In 2018, America's top billionaires paid just 23 percent of their income in taxes.

For the first time in the history of the United States, billionaires had a lower effective tax rate than working-class Americans.

Guest Essay

It’s Time to Tax the Billionaires

By Gabriel Zucman

Gabriel Zucman is an economist at the Paris School of Economics and the University of California, Berkeley.

Until recently, it was hard to know just how good the superrich are at avoiding taxes. Public statistics are oddly quiet about their contributions to government coffers, a topic of legitimate interest in democratic societies.

Over the past few years, I and other scholars have published studies and books attempting to fix that problem. While we still have data for only a handful of countries, we’ve found that the ultrawealthy consistently avoid paying their fair share in taxes. In the Netherlands, for instance, the average taxpayer in 2016 gave 45 percent of earnings to the government, while billionaires paid just 17 percent.

Billionaires avoid taxes outside

the United States, too

United States

Netherlands

Lower earners

0-50th percentile

Middle earners

51-90th percentile

High earners

90-99.99th percentile

Billionaires

Billionaires avoid taxes outside the United States, too

50% total tax rate

Sources: Demetrio Guzzardi, et al., Journal of the European Economic Association; Emmanuel Saez and Gabriel Zucman; Institut des Politiques Publiques; Netherlands Bureau for Economic Policy Analysis

Note: Data is from 2015 for Italy; 2016 for the Netherlands and France; 2018 for the United States.

Why do the world’s most fortunate people pay among the least in taxes, relative to the amount of money they make?

The simple answer is that while most of us live off our salaries, tycoons like Jeff Bezos live off their wealth. In 2019, when Mr. Bezos was still Amazon’s chief executive, he took home an annual salary of just $81,840 . But he owns roughly 10 percent of the company , which made a profit of $30 billion in 2023.

If Amazon gave its profits back to shareholders as dividends, which are subject to income tax, Mr. Bezos would face a hefty tax bill. But Amazon does not pay dividends to its shareholders. Neither does Berkshire Hathaway or Tesla. Instead, the companies keep their profits and reinvest them, making their shareholders even wealthier.

Unless Mr. Bezos, Warren Buffett or Elon Musk sell their stock, their taxable income is relatively minuscule. But they can still make eye-popping purchases by borrowing against their assets. Mr. Musk, for example, used his shares in Tesla as collateral to rustle up around $13 billion in tax-free loans to put toward his acquisition of Twitter.

research paper on sentiment analysis

Jeff Bezos arriving for a news conference after flying into space in the Blue Origin New Shepard rocket on July 20, 2021.

Getty Images

Outside the United States, avoiding taxation can be even easier.

Take Bernard Arnault, the wealthiest person in the world. Mr. Arnault’s shares in LVMH, the luxury goods conglomerate, officially belong to holding companies that he controls. In 2023, Mr. Arnault’s holdings received about $3 billion in dividends from LVMH. France — like other European countries — barely taxes these dividends, because on paper they are received by companies. Yet Mr. Arnault can spend the money almost as if it were deposited directly into his bank account, so long as he works through other incorporated entities — on philanthropy , for instance, or to keep his megayacht afloat or to buy more companies .

Historically, the rich had to pay hefty taxes on corporate profits, the main source of their income. And the wealth they passed on to their heirs was subject to the estate tax. But both taxes have been gutted in recent decades. In 2018, the United States cut its maximum corporate tax rate to 21 percent from 35 percent. And the estate tax has almost disappeared in America. Relative to the wealth of U.S. households, it generates only a quarter of the tax revenues it raised in the 1970s.

The falling U.S. corporate tax rate

Reagan tax cuts

Trump tax cuts

Source: Internal Revenue Service

Note: Tax rates are for each year’s highest corporate income bracket.

So what should be done?

One obstacle to taxing the very rich is the risk they may move to low-tax countries. In Europe, some billionaires who built their fortune in France, Sweden or Germany have established residency in Switzerland , where they pay a fraction of what they would owe in their home country. Although few of the ultrawealthy actually move their homes , the possibility that they might has been a boogeyman for would-be tax reformers.

There is a way to make tax dodging less attractive: a global minimum tax. In 2021, more than 130 countries agreed to apply a minimum tax rate of 15 percent on the profits of large multinational companies. So no matter where a company parks its profits, it still has to pay at least a baseline amount of tax under the agreement.

In February, I was invited to a meeting of Group of 20 finance ministers to present a proposal for another coordinated minimum tax — this one not on corporations, but on billionaires. The idea is simple. Let’s agree that billionaires should pay income taxes equivalent to a small portion — say, 2 percent — of their wealth each year. Someone like Bernard Arnault, who is worth about $210 billion, would have to pay an additional tax equal to roughly $4.2 billion if he pays no income tax. In total, the proposal would allow countries to collect an estimated $250 billion in additional tax revenue per year, which is even more than what the global minimum tax on corporations is expected to add.

research paper on sentiment analysis

Bernard Arnault watching the men’s singles final at the French Open on June 8, 2014.

Abaca Press

Critics might say that this is a wealth tax, the constitutionality of which is debated in the United States. In reality, the proposal stays firmly in the realm of income taxation. Billionaires who already pay the baseline amount of income tax would have no extra tax to pay. The goal is that only those who dial down their income to dodge the income tax would be affected.

Critics also claim that a minimum tax would be too hard to apply because wealth is difficult to value. This fear is overblown. According to my research, about 60 percent of U.S. billionaires’ wealth is in stocks of publicly traded companies. The rest is mostly ownership stakes in private businesses, which can be assigned a monetary value by looking at how the market values similar firms.

One challenge to making a minimum tax work is ensuring broad participation. In the multinational minimum tax agreement, participating countries are allowed to overtax companies from nations that haven’t signed on. This incentivizes every country to join the agreement. The same mechanism should be used for billionaires. For example, if Switzerland refuses to tax the superrich who live there, other countries could tax them on its behalf.

We are already seeing some movement on the issue. Countries such as Brazil, which is chairing the Group of 20 summit this year and has shown extraordinary leadership on the issue, and France , Germany, South Africa and Spain have recently expressed support for a minimum tax on billionaires. In the United States, President Biden has proposed a billionaire tax that shares the same objectives.

To be clear, this proposal wouldn’t increase taxes for doctors, lawyers, small-business owners or the rest of the world’s upper middle class. I’m talking about asking a very small number of stratospherically wealthy individuals — about 3,000 people — to give a relatively tiny bit of their profits back to the governments that fund their employees’ educations and health care and allow their businesses to operate and thrive.

The idea that billionaires should pay a minimum amount of income tax is not a radical idea. What is radical is continuing to allow the wealthiest people in the world to pay a smaller percentage in income tax than nearly everybody else. In liberal democracies, a wave of political sentiment is building, focused on rooting out the inequality that corrodes societies. A coordinated minimum tax on the superrich will not fix capitalism. But it is a necessary first step.

More on tax evasion and inequality

research paper on sentiment analysis

This Is Tax Evasion, Plain and Simple

By Gabriel Zucman and Gus Wezerek

research paper on sentiment analysis

The Tax Pirates Are Us

By Binyamin Appelbaum

research paper on sentiment analysis

How to Tax Our Way Back to Justice

By Emmanuel Saez and Gabriel Zucman

The Times is committed to publishing a diversity of letters to the editor. We’d like to hear what you think about this or any of our articles. Here are some tips . And here’s our email: [email protected] .

Follow the New York Times Opinion section on Facebook , Instagram , TikTok , WhatsApp , X and Threads .

Gabriel Zucman is an economist at the Paris School of Economics and the University of California, Berkeley, and a co-author of “The Triumph of Injustice: How the Rich Dodge Taxes and How to Make Them Pay.”

  • Share full article

Advertisement

Regions & Countries

Religious landscape study.

research paper on sentiment analysis

The RLS, conducted in 2007 and 2014, surveys more than 35,000 Americans from all 50 states about their religious affiliations, beliefs and practices, and social and political views. User guide | Report about demographics | Report about beliefs and attitudes

Explore religious groups in the U.S. by tradition, family and denomination

Explore religious affiliation data by state, region or select metro areas, northeastern states.

  • Connecticut
  • Massachusetts
  • New Hampshire
  • Pennsylvania
  • Rhode Island

Southern States

  • District of Columbia
  • Mississippi
  • North Carolina
  • South Carolina
  • West Virginia

Midwestern States

  • North Dakota
  • South Dakota

Western States

All metro areas.

  • Atlanta Metro Area
  • Baltimore Metro Area
  • Boston Metro Area
  • Chicago Metro Area
  • Dallas/Fort Worth Metro Area
  • Detroit Metro Area
  • Houston Metro Area
  • Los Angeles Metro Area
  • Miami Metro Area
  • Minneapolis/St. Paul Metro Area
  • New York City Metro Area
  • Philadelphia Metro Area
  • Phoenix Metro Area
  • Pittsburgh Metro Area
  • Providence Metro Area
  • Riverside, CA Metro Area
  • San Diego Metro Area
  • San Francisco Metro Area
  • Seattle Metro Area
  • St. Louis Metro Area
  • Tampa Metro Area
  • Washington, DC Metro Area

Topics & Questions

Demographic information.

  • Race and Ethnicity
  • Immigration Status
  • Marital Status
  • Parental Status

Beliefs and Practices

  • Belief in God
  • Importance of Religion
  • Attendance at Religious Services
  • Prayer Frequency
  • Prayer Groups
  • Feelings of Spiritual Wellbeing
  • Feelings of Sense of Wonder
  • Guidance on Right and Wrong
  • Standards for Right and Wrong
  • Reading Scripture
  • Interpretation of Scripture
  • Belief in Heaven
  • Belief in Hell

Social and Political Views

  • Political Party
  • Political Ideology
  • Size of Government
  • Government Aid to the Poor
  • Homosexuality
  • Same-Sex Marriage
  • Protecting the Environment
  • Human Evolution

About Pew Research Center Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of The Pew Charitable Trusts .

research paper on sentiment analysis

Salesforce is closed for new business in your area.

Help | Advanced Search

Computer Science > Sound

Title: joint sentiment analysis of lyrics and audio in music.

Abstract: Sentiment or mood can express themselves on various levels in music. In automatic analysis, the actual audio data is usually analyzed, but the lyrics can also play a crucial role in the perception of moods. We first evaluate various models for sentiment analysis based on lyrics and audio separately. The corresponding approaches already show satisfactory results, but they also exhibit weaknesses, the causes of which we examine in more detail. Furthermore, different approaches to combining the audio and lyrics results are proposed and evaluated. Considering both modalities generally leads to improved performance. We investigate misclassifications and (also intentional) contradictions between audio and lyrics sentiment more closely, and identify possible causes. Finally, we address fundamental problems in this research area, such as high subjectivity, lack of data, and inconsistency in emotion taxonomies.

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Textual sentiment analysis and description characteristics in crowdfunding success: The case of cybersecurity and IoT industries

  • Research Paper
  • Open access
  • Published: 29 April 2024
  • Volume 34 , article number  30 , ( 2024 )

Cite this article

You have full access to this open access article

research paper on sentiment analysis

  • Abraham Yosipof   ORCID: orcid.org/0000-0002-3176-8982 1 , 2 ,
  • Netanel Drori 3 ,
  • Or Elroy 1 , 4 &
  • Yannis Pierraki 5  

216 Accesses

1 Altmetric

Explore all metrics

Crowdfunding platforms offer entrepreneurs the opportunity to evaluate their technologies, validate their market, and raise funding. Such platforms also provide technologies with an opportunity to rapidly transition from research to market, which is especially crucial in fast-changing industries. In this study, we investigated how the sentiments expressed in the text of the project campaigns and project characteristics influence the success of crowdfunding in innovative industries such as cybersecurity and the Internet of Things (IoT). We examined 657 cybersecurity and Internet of Things (IoT) projects between 2010 and 2020 that were promoted on Kickstarter and IndieGoGo, two rewards-based crowdfunding platforms. We extracted technological topic attributes that may influence project success and measured the sentiments of project descriptions using a Valence Aware Dictionary and sEntiment Reasoner (VADER) model. We found that the sentiment of the description and the textual topic characteristics are associated with the success of funding campaigns for cybersecurity and IoT projects.

Similar content being viewed by others

research paper on sentiment analysis

A Methodology to Characterize and Compute Public Perception via Social Networks

research paper on sentiment analysis

The Strategic Entrepreneurship Pitching on Crowdfunding Platforms : A Traction Toward Emerging Advanced Technologies

research paper on sentiment analysis

Leveraging SMEs technologies adoption in the Covid-19 pandemic: a case study on Twitter-based user-generated content

Avoid common mistakes on your manuscript.

Introduction

Online crowdfunding is a relatively new form of financing for projects, people, and businesses that has received considerable attention from both academics and practitioners in the last decade (Belleflamme et al., 2014 ; Mollick, 2014 ). The crowdfunding model enables a large number of people to contribute small amounts of money to projects in the hope of achieving a combined total amount that meets or surpasses a predetermined funding target that was decided by the project. Crowdfunding has its roots in the creative industries, where it was successfully pioneered in the financing of albums and concerts (Gamble et al., 2017 ). Schwienbacher and Larralde ( 2010 ) define crowdfunding as an open call for the provision of financial resources either in the form of donations or in exchange for some form of reward or voting rights to support initiatives for specific purposes. Mollick ( 2014 ) defines crowdfunding as “the efforts by entrepreneurial individuals and groups – cultural, social, and for-profit – to fund their ventures by drawing on relatively small contributions from a relatively large number of individuals using the internet, without standard financial intermediaries.”

Crowdfunding has grown exponentially in recent years and is expected to reach a market size of $28.8 billion by 2025. The concept of crowdfunding is rooted in the broader concept of crowdsourcing, which develops activities using the ideas, feedback, and solutions sourced from the “crowd” (Belleflamme et al., 2014 ). The objective of crowdfunding is to raise funding from the general public who can then participate in strategic decisions or may even have voting rights, e.g., in the case of equity crowdfunding (Lambert & Schwienbacher, 2010 ). During initial concepts and seed phases, companies can use donations and rewards-based crowdfunding (Best et al., 2013 ; Rossi, 2014 ), which became popular thanks to platforms like IndieGoGo in 2008 and Kickstarter in 2009 (Agrawal et al., 2014 ; Ahlers et al., 2015 ; Belleflamme et al., 2014 ; Mollick, 2014 ; Zhang & Chen, 2019 ). During the period of planning, development, business launch, and early growth, crowdfunding may bridge the gap to later capital needs in the future, such as expansions, where traditional forms of financing, like business angels and venture capital funds, are available.

Sentiment and textual analysis have been used by researchers to investigate how emotions and sentiments expressed in pitches of entrepreneurs may influence crowdfunding fundraising success, providing contradicting results (Mochkabadi & Volkmann, 2020 ; Wang et al., 2017 , 2018 ). Further, previous research on the dynamics of crowdfunding did not distinguish between industries (Mollick, 2014 ). The objective of this paper is to investigate how the sentiment of text in project campaigns, and project topic characteristics, influence crowdfunding success in innovative industries. We focus on projects in the fields of cybersecurity and IoT, as those present high risks in terms of being disrupted or becoming obsolete (Jensen & Özkil, 2018 ; Moore, 2010 ; Zhu et al., 2021 ). Backers face high information asymmetries with respect to evaluating the underlying science of such technologies, as well as the market opportunities available to them. Determining which new crowdfunding projects are likely to be successful is a challenging task that may require specialized knowledge from the investors. Potential investors in such projects may be more prone to sentiments and emotions to compensate for the challenges associated with the novelty of the technologies and the lack of specialized knowledge. The fundamental uniqueness of cybersecurity and IoT projects within a crowdfunding environment is that they rely on the crowd, rather than technology experts, who are arguably less equipped to make educated investment decisions due to a lack of specialized knowledge. It is therefore unclear whether innovative and unconventional projects, such as cybersecurity and IoT projects, are well-positioned to leverage crowdfunding advantages compared to other, more conventional sectors.

In this study, we examined campaigns listed on the Kickstarter and IndieGoGo platforms between 2010 and 2020. Both platforms are based in the USA but serve entrepreneurs from across the world who engage in fundraising campaigns. We identified 657 campaigns that involve cybersecurity and IoT-related projects. The goals of this study are to investigate how the sentiments derived from the text used in the description of crowdfunding campaigns relate to funding success and to examine whether specific technology topics used by cybersecurity projects may influence campaign success.

We, therefore, make the following contributions. First, we show how the sentiment of the description of cybersecurity and IoT projects affects the campaigns’ success. Second, we demonstrate how the text embedded in the project campaigns, created by the entrepreneurs to identify specific technological topics, is associated with campaign success. Third, we examine whether previous research findings on the drivers of success in crowdfunding generally also hold for projects in the cybersecurity and IoT sectors, even though these sectors require specialized knowledge from investors. Lastly, we contribute to the literature on the arguably under-researched intersection of entrepreneurial finance and specialized projects. The results of this study will benefit technology professionals, potential investors, and companies operating in cybersecurity- and IoT-related technologies.

This study is structured as follows: The second and third sections review the academic literature related to sentiment analysis in crowdfunding research. The fourth section describes the research methodology and data sources. The fifth section presents the results of the analysis. The sixth section discusses key findings and practical implications. The seventh section presents the limitations of the study and provides areas for future research.

Literature background

Crowdfunding dynamics and campaign success.

Venture capital scholars have provided an extensive list of factors that lead to successful company fundraising (Baum & Silverman, 2004 ; Shane & Stuart, 2002 ). In this case, potential signals of quality play an important role in investors’ decisions (Spence, 1978 ). In the context of crowdfunding, previous research identified several quality signals that lead to the success of crowdfunding campaigns (Ahlers et al., 2015 ). Many projects lack various types of professional quality aspects, which might be the reason why so many projects do not reach their funding goal (Mochkabadi & Volkmann, 2020 ). Mollick ( 2014 ) analyzed Kickstarter campaigns and found that personal networks and project quality are associated with the success of crowdfunding efforts. In addition, longer duration of the campaign decreases the chances of success (Cumming et al., 2017 ; Mollick, 2014 ; Song et al., 2019 ), possibly because a long campaign is a sign for lack of confidence (Mollick, 2014 ). However, Zheng et al. ( 2014 ) found the opposite is true on Chinese reward-crowdfunding platforms, where the duration of the campaign is positively associated with success.

Promotion by the platform is strongly associated with success (Song et al., 2019 ), and therefore, projects promoted by a crowdfunding platform, such as Kickstarter’s Staff Picks or Projects We Love, are more likely to succeed. Signals such as videos and frequent updates are associated with greater success, and spelling errors reduce the chance of success (Jensen & Özkil,  2018 ; Wu et al.,  2024 ; Zhang et al., 2023 ). Cumming et al. ( 2017 ) found that the success of cleantech crowdfunding projects likely depends on the number of photos in their gallery, the presence of video pitches, and the length and quality of the project description.

In the case of equity crowdfunding, where investors receive a stake in the company in exchange for their financial support, Hakenes and Schlegel ( 2014 ) found that high funding goals may provide backers with a sense of security as their investment will only go through if enough other people will also choose to back the campaign, which implies that a higher level of due diligence will be performed. However, in the case of reward crowdfunding, which offers backers non-monetary, often tangible rewards in return for their pledges, such as products or experiences, several researchers suggest that higher funding goals lead to lower chances of success (Cumming et al., 2017 ; Jolliffe, 2002 ; Mollick, 2014 ; Zheng et al., 2014 ). In addition, Belleflamme et al. ( 2014 ) found that smaller targets are preferable in rewards-based campaigns and larger targets in equity crowdfunding.

Belleflamme et al. ( 2014 ) also found that companies that offer products are more successful in achieving their funding goals than those offering services, mainly due to the inherent preference of people to invest in tangible outcomes which are perceived as more certain. Furthermore, Härkönen ( 2014 ) suggests that the success of a crowdfunding campaign can be attributed to the ability of the crowd to easily understand the promoted product. In this case, the information provided in the description of the crowdfunding pitch is of particular importance. Following the work of Belleflamme et al. ( 2014 ), this work distinguishes between software and hardware projects to test whether the “tangibility” of a project also plays a role in the success of a campaign.

Akerlof ( 1970 ) described the asymmetry of information using the example of used car sales, where the seller usually has better information on the product. In crowdfunding, the entrepreneur knows more about the project than the investors, which creates uncertainty that is further intensified in the case of projects that also require specialized knowledge. A large number of projects and investors involved in crowdfunding platforms offer a unique learning environment to study the information asymmetries when new technologies are involved and the value of mechanisms on crowdfunding platforms to mitigate such asymmetries (Cumming et al., 2017 ).

Cybersecurity and IoT

There is no consensual definition of cybersecurity, as it is a broadly used term with highly variable definitions, often subjective, and at times uninformative (Craigen et al., 2014 ). The International Telecommunication Union defines cybersecurity as “the collection of tools, policies, security concepts, security safeguards, guidelines, risk management approaches, actions, training, best practices, assurance and technologies that can be used to protect the cyber environment and organization and user’s assets” (International Telecommunication Union, 2009 ; ITU, 2009 ). Craigen et al. ( 2014 ) define cybersecurity as “the organization and collection of resources, processes, and structures used to protect cyberspace and cyberspace-enabled systems from occurrences that misalign de jure from de facto property rights.” Most definitions emphasize the multidimensional nature of cybersecurity and its relation to organizational, economic, political, and other human dimensions (Goodall et al., 2009 ).

The Internet of Things (IoT) describes the network of physical objects, i.e., “things,” embedded with sensors, software, and other technologies for the purpose of communicating and exchanging data with other devices and systems over the internet. IoT projects are on the rise as a result of the progress in digitization and its positive effect on firms’ performance (Viktora-Jones et al., 2024 ) and as being an important element in digital transformation. IoT projects can be based on software or hardware, where cybersecurity could be a subcategory of IoT (Sorri et al., 2022 ).

Although the importance of companies in the fields of cybersecurity and IoT is enormous today, those companies face several challenges, such as increased legal and industry competition risks, that differentiate them from conventional companies that raise funding from private investors (Zhu et al., 2021 ). Jensen and Özkil ( 2018 ) identified challenges in crowdfunded technology product development, that could result in the failure of the crowdfunding campaign. In addition, Molling and Zanela Klein ( 2022 ) found that companies struggle to understand the potential and limitations of IoT to generate appropriate value propositions for their IoT products and services. Crowdfunding platforms such as Kickstarter and IndieGoGo offer a fast transition from research and development to the market, and examining the dynamics of crowdfunding in the IoT and cybersecurity industries is therefore of great importance.

Textual and sentiment analysis in crowdfunding research: hypotheses development

Mochkabadi and Volkmann ( 2020 ) argue that there is great potential in analyzing how the language used in updates and project proposals relates to campaign success. Previous research used textual analysis to identify the role of a project’s description in the success of the campaign. Sentiment is usually related to the self-confidence of the author, where authors with high confidence are more likely to create positive text (Wang et al., 2017 ). Research on Peer-to-Peer (P2P) lending shows that overly confident borrowers may not be able to repay their loans in time (Gao & Lin, 2015 ) indicating that the sentimental effect may not lead to positive outcomes.

Previous research identified two broad categories regarding textual analysis: readability and tone (Dority et al., 2021 ). Several proxies have been used for readability, including word count, language complexity, spelling and grammatical errors, and more. The relation between the length of the pitch and the funding success is not clear. Several studies found a positive relationship between the number of words in the pitch and funding success (Cumming et al., 2017 ; Zhou et al., 2018 ), meaning that a more detailed and longer description increases the success rate. However, other studies found a negative relationship (Horvát et al., 2018 ) or a U-shaped relationship (Nowak et al., 2018 ). To contribute to this debate, we also investigated the length of the title and the description of the projects.

The Flesch-Kincaid readability tests are designed to indicate how difficult a text in English is to understand and are commonly used to assess the readability level of text. These tests assess the difficulty of reading the given text based on several constants and the number of words, sentences, and syllables, as well as the grade a reader needs to be to be able to understand it. Block et al. ( 2018 ) used the Flesch Readability Index to measure the language complexity of campaign updates. They found that updates with simpler language significantly increased the number of investments made during the campaign. Simpson’s Diversity Index is a measure used to quantify the diversity or richness of species within a community, taking into account both the number of different species present and their relative abundance. To investigate the impact of readability on funding success, Nowak et al. ( 2018 ) used Simpson’s Diversity Index to measure the diversity of the languages used in the description of a loan. The Linguistic Inquiry and Word Count (LIWC) is a text analysis tool that quantifies the presence of psychologically meaningful categories in a language, providing insights into the psychological and emotional content of texts. Horvát et al. ( 2018 ) used the Linguistic Inquiry and Word Count dictionary model. These studies found that higher counts of different words, punctuation, prepositions, and adjectives result in higher funding success.

Dority et al. ( 2021 ) examined the impact of the language used in the campaign description on campaign success, and specifically for Title II equity-based crowdfunding. They examined the campaign descriptions and focused on tone and two aspects of readability: information quantity — the amount of information available to the investor, and information quality — the ease of understanding of the passage of text. Overall, the results indicate an inverted U-shaped relationship between information quantity, information quality, and tone and Title II equity crowdfunding campaign success.

To capture the tone of the crowdfunding campaign, previous research used sentiment analysis to identify how the sentiment of the project description may impact the success of crowdfunding campaigns. When humans approach text, they use inferences to determine the tone of the text, such as whether it is positive or negative. The inferences ultimately impact how the reader feels about a certain text and can have a significant impact on the decisions they make (Dority et al., 2021 ). The limited research on the impact of textual tone on funding success shows mixed patterns across different types of crowdfunding. For instance, Horvát et al. ( 2018 ) examined equity crowdfunding and found that negative emotions in the pitch are positively associated with funding probability. On the other hand, Wang et al. ( 2017 ) found that strong positive sentiment is associated with successful reward crowdfunding campaigns.

Uparna and Bingham ( 2022 ) studied over 30,000 entrepreneurial loan requests from one of the largest loan marketplaces to understand how the sentiment in text-only pitches to investors affects fundraising. They found that pitches with negative sentiment are funded faster than those with positive sentiment, and that pitches with negative sentiment result in lower interest rates for entrepreneurs and fewer defaults. Peng et al. ( 2022 ) analyzed donation data to investigate how individuals’ donation behavior is affected by previous donation amounts and the information provided by the fundraising platform. They found that positive sentiment in the messages left by donors does not affect subsequent donation amounts.

Several papers investigated rewards-based crowdfunding project success using sentiment analysis. Li et al. ( 2022 ) examined the success determinants of cultural and creative crowdfunding (CCCF) projects using Natural Language Processing (NLP) to calculate the sentiment and information entropy of reviews in crowdfunding projects. They found a positive influence of peer review valence in CCCF projects on crowdfunding success. Valence indicates the average sentiment valence of all reviews of a crowdfunding project. Wang et al. ( 2022a ) used sentiment analysis and paired sample t -tests to examine differences in crowdfunding campaigns before and after the COVID-19 outbreak in March 2020. Their findings suggest that sad emotions were significant in the description of campaigns following the COVID-19 outbreak.

Wang et al. ( 2022b ) investigated information distortion in investment decision-making within the crowdfunding market. They discovered that a more detailed project description with a positive sentiment, encourages investors to invest in the project. Based on the results of these studies and the assumption that crowdfunding investors may be particularly susceptible to sentiment in emerging and specialized industries, we expect that sentiments derived from textual analysis will also play a role in cybersecurity and IoT crowdfunding success. Crowdfunding success is defined and evaluated by three metrics in this research. Further details can be seen in the methodology section. We therefore formulated the following hypothesis:

Hypothesis 1 : A project description with a positive sentiment will be positively associated with the project’s success.

Horvát et al. ( 2018 ) analyzed United Kingdom equity crowdfunding data and focused primarily on the text associated with each campaign. They utilized the Linguistic Inquiry and Word Count dictionary to investigate stylistic aspects of the language and identify elements of the language that are associated with success, regardless of the type or sector of a venture. Latent Dirichlet Allocation was used by Horvát et al. ( 2018 ) to model the topics within campaign descriptions, revealing that the description of an equity crowdfunding campaign can significantly affect fundraising success. The extent to which campaigns are spread across topics was measured using entropy. Low entropy represents certainty and high entropy represents uncertainty. Horvát et al. ( 2018 ) found that the novelty of a campaign, as measured by the topic entropy of the text description, is negatively correlated with success: campaigns that are easily categorized into a few coherent topics are significantly more successful than their counterparts with a diversity of topics. For example, a topic consisting of student, school, education, and university, is coherent. On the other hand, a topic consisting of film, bank, stove, and sport, is incoherent. This result holds even after controlling for writing quality and style, as well as a suite of variables previously identified by other studies to impact success. Adding to these results the challenges faced by many companies in crafting clear and compelling value propositions for IoT products and services (Molling & Zanela Klein, 2022 ), alongside the general lack of understanding among audiences regarding IoT projects and services (Kumar et al., 2019 ), we also expect that topics identified in textual analysis, which are not easily understood by the crowd, may negatively influence crowdfunding fundraising success in this industry. We therefore hypothesize that:

Hypothesis 2 : A project description that involves cybersecurity and IoT technological topics will be negatively associated with crowdfunding project success.

Context of the study

The cybersecurity industry is growing rapidly, with entrepreneurs constantly starting up new technological businesses around the world. Market analysts estimate that the global information security market, of which cybersecurity is a part, will grow at a 5-year CAGR of 8.5% to reach $281.7 billion by 2027 (Fortune Business Insights, 2023 ). The largest cybersecurity IPO so far was CrowdStrike, an AI-powered endpoint security platform that protects corporate networks at vulnerable areas of connection, like laptops and phones. CrowdStrike went public in June 2019 at a $6.7B valuation. The large number of startups established annually may overwhelm market intelligence professionals and investors who try to predict which technologies have the potential to be successful.

Cybersecurity projects have been used as a context for this study for several reasons. First, such projects have a high chance to fundamentally reshape and change the way traditional industries have been working. New technologies, business models, and approaches that challenge the status quo are likely to significantly change the industry landscape, and therefore have the potential to produce high investment returns.

However, at the same time, this type of project is also prone to being disrupted by other competing projects shortly, such as by novel technology that makes the project redundant (Jensen & Özkil, 2018 ; Zhu et al., 2021 ). In other words, a slow or inefficient implementation process of new research in market technologies can lead to a good project being undermined by other projects that have transitioned faster but are not necessarily better. First, a swift and successful transition of new research to market technologies is therefore necessary to prevent the project from being undermined. Second, cybersecurity technologies often require uncommon, specialized knowledge, which the crowd does not generally possess, therefore increasing the risk of lower or slower adoption. Third, the available qualified workforce to defend computer systems is not growing fast enough. According to some industry reports, there are more job openings than individuals qualified to fill them (Lewis & Crumpler, 2019 ), and there will soon be a shortage of cybersecurity professionals (Ventures, 2017 ). Finally, there is uncertainty regarding the underlying science of cybersecurity since much of the scientific research is funded by organizations or governmental agencies with high levels of confidentiality (Maughan et al., 2013 , 2015 ). This further exacerbates the uncertainty associated with cybersecurity and the information asymmetry faced by investors in general and crowdfunding backers in particular.

Sample and data

New cybersecurity and privacy-related technologies are essential to the security and cyber-resilience of systems and infrastructure. The World Economic Forum defines cyber-resilience as “the ability of systems and organizations to withstand cyber events, measured by the combination of mean time to failure and mean time to recovery” (World Economic Forum, 2012 ). The use of the term “cyber” encompasses the interdependent network of information technology and includes technological tools such as the internet, telecommunication networks, and computer systems (Gortney, 2016 ). Artificial intelligence, blockchain technology, and their integration with the IoT enable many potential applications related to cybersecurity and consequently unique opportunities for both entrepreneurs and investors.

In this work, we used a similar methodology to Song et al. ( 2019 ). We used data from webrobots.io to compile a dataset of projects from Kickstarter and IndieGoGo. We preprocessed the data and removed all duplicate entries of projects that appeared under multiple categories and projects that are not “finished,” such as projects that are active, cancelled, or suspended. We included only projects related to cybersecurity or IoT by requiring one or more of the following phrases in the description: “Cybersecurity,” “Cyberwarfare,” “Secure Coding,” “Cyber Threats,” “Cyber Privacy,” “Blockchain,” “Cryptocurrency,” “Artificial Intelligence Security,” “AI Cyber,” “Internet of Things,” “IoT,” “Web Security,” “Network Security,” “Information Security,” “Internet Security,” “Mobile Security,” “Firewall,” “Antivirus,” “Hacker,” “Smart Home,” and “Raspberry Pi.” We manually reviewed the dataset and removed any project that was unrelated to the topic. The final dataset consists of 657 projects, of which 539 are from Kickstarter and 118 are from IndieGoGo.

Model specification

We constructed a model for estimating project success with a common set of relevant control variables. We used the following model (Eq.  1 ) to test the hypotheses. The dependent variable is project success. Let independent variables be the vector of independent variables, which includes the sentiment index and a set of textual topic variables. Let the project level controls be the vector of the project characteristics variables. Let macro-level control be an economy-wide indicator. The vector of time-fixed effects stands for year dummies. Finally, let be the error term:

Dependent variable

Following previous studies, we used three different operationalizations for cybersecurity and IoT project success (Cumming et al., 2017 ). First, project success was measured by the ratio between the total amount of money raised and the project fundraising goal, denoted as a continuous variable ( funds ). Second, we constructed a binary variable indicating whether the project succeeded in raising the predetermined amount of money in full ( outcome ). The outcome variable is based on the Kickstarter “all or nothing” model that indicates if the project fully accomplished its financial goal, i.e., whether it was successful or failed (Cumming et al., 2017 ). Third, we used the number of backers of each project as a discrete variable ( backers ).

The correlation between the funds and the other operationalizations is very low and insignificant ( r  = 0.043 and r  =  − 0.001, respectively). The correlation between the outcome and backers variables is only moderate but significant ( r  = 0.337, p -value < 0.05). These findings support our decision to measure project success using three different metrics, as each metric describes different aspects that are not described by the other metrics.

Independent variables

Sentiment index.

Crowdfunding platforms enable entrepreneurs to provide textual information to potential backers to encourage backing for their venture. Therefore, it is important for entrepreneurs to identify and signal certain features of their projects, such as the technologies used and positive sentiment, to influence the investment decisions of backers.

The lexicon-based approach to sentiment analysis uses a predefined dictionary with sentiment labels assignments to words, such that each word is labeled as positive, negative, or neutral. The word sentiment scores are then combined to determine the overall sentiment orientation of the text. We used the lexicon-based approach to determine the sentiment index of the texts in the campaigns and calculate the orientation of a project from the semantic orientation of words or phrases (Ngoc & Yoo, 2014 ). Previous research that used the lexicon-based approach determined the sentiment by identifying adjectives from the text that correspond with the dictionary of words, and the total sentiment score reflected the polarity of the text (Dorfleitner et al., 2016 ; Horvát et al., 2018 ).

We used VADER, a Valence Aware Dictionary and sEntiment Reasoner model, to measure the sentiment index of the description of each cybersecurity and IoT project. The sentiment score ranges from − 1 for the most negative sentiment to + 1 for the most positive sentiment (Hutto & Gilbert, 2014 ).

Technology textual topic variables

Crowdfunding and other campaigns by firms in sectors with pronounced information problems are more sensitive to soft information (Cumming et al., 2017 ). The understandability of the concept or offering of a product or service in these sectors is a rather complex feature to measure. Therefore, cybersecurity and IoT projects can be considered at higher risk than projects in more traditional industries, and as such, their application needs to be thoroughly clarified to entice potential backers.

Technological innovations in artificial intelligence, cloud computing, big data analytics, quantum computing, blockchain, and other software and hardware applications ensure that contemporary cybersecurity will remain in flux (Wilner, 2018 ). IoT is an enabler for the intelligence affixed to several essential features of the modern world, such as homes, hospitals, buildings, transports, and cities. There are many benefits provided by IoT, but it comes with challenges, such as poor management, energy efficiency, identity management, security, and privacy (Yaqoob et al., 2017 ). Security and privacy are some of the critical issues related to the wide application and adaptation of IoT (Burhan et al., 2018 ).

In the case of cybersecurity and IoT project campaigns, we expected that certain words included in the project description may influence the decision of potential backers. We therefore mined the descriptions of projects for frequent words related to their technological attributes. We extracted and tokenized the projects description from Kickstarter and IndieGoGo. We preprocessed the texts by converting them to lowercase and removing stop-words and punctuation. To reduce the noise, we then removed words that appeared less than 25 times according to the term’s frequency distribution. The process revealed ten textual variables. Since the frequency of each word is relatively low, we created binary technology topic variables by combining keywords of the same subject that represent the technology or the topic of the project. The final ten binary textual variables are: “Software,” “Hardware,” “DIY” (Do It Yourself), “Raspberry Pi,” “IoT,” “Blockchain,” “Cybersecurity,” “Cryptocurrency,” “Arduino,” and “Smart Home.”

Control variables

We included six control variables in the model. First, we controlled for project duration as measured by the number of days between the launch date and the project deadline ( Project Duration ). Second, we controlled for the project title length as measured by the number of characters ( Title Length ). Third, we controlled for the project description length, as measured by the number of characters ( Description Length ). Fourth, a binary variable that takes the value of 1 if the project is from the United States ( USA ) and 0 otherwise, as the project’s country of origin may affect the backers’ decision. Fifth, since the project success may be affected by the platform, we added a binary variable that takes the value of 0 if the project was on the IndieGoGo website and the value of 1 if the project was on the Kickstarter website ( Platform ). Sixth, we included the NASDAQ seven-day return prior to project launch day, measured as a continuous variable ( Nasdaq Return ), as investment decisions are influenced by macroeconomic conditions in general (Drori et al., 2024 ), and cybersecurity and IoT venture decisions are impacted in particular by technology sector conditions (Campello & Graham, 2013 ; Chen et al., 2007 ).

Estimation approach

As the dependent variable, i.e., the success of the project was measured in three different ways, we used different methods to correspond to the scales and unique features of the variables. We used an Ordinary Least Squares (OLS) regression Footnote 1 for the funds continuous variable, a logistic regression for the outcome binary variable, and a count data model for the backers discrete variable. A count data model counts the number of backers for the project. We opted to implement a negative binomial regression model, rather than a Poisson model, because the latter assumes equality between the conditional mean and conditional variance (Cameron & Trivedi, 2013 ), which does not characterize the distribution of the backers variable (mean = 327; variance = 560,228). The post-estimation likelihood-ratio test chi-square of the dispersion parameter alpha in the negative binomial model ( α  = 2.326) significantly indicates that it is greater than zero (chi-squared = 410,000, p  < 0.001). This result strongly suggests that the dependent variable is over-dispersed, thus confirming the choice of a negative binomial model (Xu & Drori, 2023 ). In addition, we used a Poisson regression model and found a high chi-square statistic, indicating that the Poisson model is inappropriate in this case ( \({\chi }^{2}\) = 762,886, p  < 0.001).

Table 1 presents the descriptive statistics of different project categories. Most of the projects are categorized as hardware projects (56%), and most originate in the USA (50%). Projects belonging to the smart home category, projects belonging to the cybersecurity category, and projects that originate in the USA, attracted the highest average number of backers (517, 447, and 402, respectively). Projects in the Smart Home category, projects in the IoT category, and projects originating in the USA raised the most funds in their campaigns (US$101,550, US$58,958, and US$55,376, respectively). The most successful categories in terms of percentages of projects successfully raising their predetermined goals are Arduino, Raspberry Pi, and Hardware, with 72%, 71%, and 68% success rates, respectively. On the other hand, Software, Cryptocurrency, and Blockchain projects have been the least successful in raising their predetermined goals, with only 22%, 28%, and 30% success rates, respectively.

Correlation matrix and regression results

Table 2 presents the correlation matrix and descriptive statistics for all the researched variables. As the model includes both continuous and binary variables, the correlation matrix reports three different correlation methods. The correlation between two continuous variables was calculated using Pearson’s correlation. The correlation between a continuous variable and a binary variable was calculated using point-biserial correlation, which is mathematically equivalent to a Pearson correlation (Sheskin, 2003 ). The correlation between two binary variables was evaluated using the Phi coefficient (Cohen, 2013 ). All measurements are on a scale between − 1 for a negative correlation and + 1 for a positive correlation.

Table 3 presents the results for three regression models to predict project success, a model for each operationalization for project success. Model 1 is an OLS regression to predict the success of a project, as measured by the funds variable. Model 2 implements logistic regression to predict the project’s success using the outcome variable. Model 3 uses count data regression (negative binomial) for the backers variable to predict the success of projects.

The sentiment index coefficients are positive and significant ( p -value < 0.05) across the three models ( β  = 1.281, β  = 2.752, β  = 0.437, respectively). The consistent results clearly show that a positive sentiment in the description of a project is associated with its success. Therefore, the empirical results of the three models support Hypothesis 1.

To test Hypothesis 2, we included ten technology textual topic binary variables in the model. We hypothesized that the inclusion of textual topics related to the technology of a project would affect its success. Except for the Hardware variable, all nine other textual variables were found to be significant in at least one of the models. The results therefore indicate that including the Hardware variable in the text does not affect the success of projects. Six of the nine significant topic variables, namely Software, DIY, IoT, Blockchain, Cryptocurrency, and Arduino, were found to have a negative effect on project success in at least one of the three models.

The Software and IoT variables have shown significant and negative effects across the three models, suggesting that including these variables in the descriptions of projects would decrease the likelihood of success, regardless of the operationalization method.

The Smart Home variable is the only textual binary variable that has shown a positive and significant coefficient across all three models. This consistent result suggests that Smart Home projects are appealing to potential backers on Kickstarter and IndieGoGo.

Given that nine of the textual binary variables, i.e., all but the Hardware variable, were found to have significant coefficients in at least one of the three models, we can conclude that Hypothesis 2 is well supported by the results. The consistent findings indicate that the textual topic description provided by entrepreneurs regarding the technology category is an important factor in project success.

The results further show that some of the control variables also consistently affect all three models. Project duration is negatively and significantly associated with project success, suggesting that, in line with previous findings from Cumming et al. ( 2017 ), Mollick ( 2014 ), and Song et al. ( 2019 ), a longer project duration has a negative effect on the likelihood of achieving success. Similarly, across all three models, the platform on which the project was featured has a significant negative coefficient, indicating that being featured on the Kickstarter platform is related to lower success as compared to being featured on IndieGoGo. In addition, the title length was found to have a significant positive effect across all three models, which means a longer project title leads to higher success rates. The description length and NASDAQ variables were found to have an insignificant effect on the success, regardless of the dependent variable operationalization. Lastly, we found that projects originating in the USA have significantly higher chances of success in terms of the funds raised (Model 1) and the number of backers (Model 3).

Robustness checks

To reinforce the results, we removed outliers by winsorizing the samples in the first and last percentiles and ran the models again. We also estimated Model 3 by using an OLS regression after adding one to the number of backers and then log-transformed it (instead of using a count data model). The results were consistent in both cases.

This study examines whether the sentiment and textual characteristics of projects play a role in crowdfunding success for cybersecurity and IoT projects. Ventures have more knowledge about their products, processes, and orientations in comparison to potential backers (Courtney et al., 2017 ). Backers will therefore consider textual topic features as part of their investment decision process, which ultimately affects the project’s success.

The results of this study show that positive sentiment in textual aspects of a campaign is positively associated with project success. These results support Hypothesis 1, according to which the success of cybersecurity and IoT projects is affected by the sentiment of their descriptions. The findings, according to which positive sentiment promotes investment and negative sentiment discourages investment, are in line with those of Wang et al. ( 2022b ).

Another objective of this study was to investigate whether specific technological topics used by cybersecurity and IoT projects are associated with an increased or decreased likelihood of campaign success. The results show that nine out of ten textual technological topic variables are significantly associated with project success. Topics that are less understood by the audience, such as IoT and Arduino, which is a platform for creating interactive electronic objects that is commonly used for prototyping, are associated with a decreased likelihood of campaign success. These results are in line with previous research that found that the crowd is less familiar with and has less understanding of the meaning and opportunities associated with IoT projects (Molling & Zanela Klein, 2022 ).

Our findings also show that projects that explicitly mention Smart Home technologies, and cybersecurity projects that provide relatively more information through their title, are more likely to be successful than those that do not. In contrast, software and IoT-related projects are more likely to fail compared to those with other technologies, no matter how success is defined. These findings are in line with previous work by Belleflamme et al. ( 2014 ), who found that companies that offer products are more successful in achieving their funding goals than those that offer services.

These results may suggest that backers are not yet familiar with technologies that are typically used in specific innovative communities, such as Arduino, which is commonly used for prototyping in the hardware development communities, or technologies that are relatively new and still not fully understood by the public. Blockchain and cryptocurrency are two such technologies that are often used interchangeably because cryptocurrency typically employs the blockchain technology. Other factors may have also influenced the decision to not back these projects, such as negative publicity and regulatory uncertainty surrounding cryptocurrency in recent years. Overall, these findings imply a lack of confidence by backers in projects involving new technologies.

The significantly positive control variables were found to be in line with Koning and Model ( 2013 ), where the number of backers had a strong and positive effect on project success, i.e., a larger number of backers represents a strong signal of project quality and high potential for success. People are more willing to trust a decision made by a large group of other investors in the context of the stock market (Kremer & Nautz, 2013 ), as well as when making online purchases (Ye & Fang, 2013 ). The results of this study also indicate that the more unfamiliar the public is with a certain technological term, less money each backer will be willing to invest, and therefore, the more backers are needed for the campaign to be successful. The finding that investors are likely to invest when they understand the project is in line with Härkönen ( 2014 ), who emphasized the importance of the public’s ability to easily understand the product or service offered by the campaign.

Theoretical and practical implications

Research on entrepreneurial finance emphasizes the challenges related to information asymmetries between investors and start-up companies (Agrawal et al., 2014 ; Ahlers et al., 2015 ). These challenges are further exacerbated in crowdfunding, as online platforms arguably offer fewer opportunities for interactions between entrepreneurs and investors (Efrat & Gilboa, 2020 ). A variety of studies have shown that in order to mitigate the risks associated with information asymmetries, investors put greater emphasis on both the type and the style of information, allowing potential investors to better evaluate projects, which ultimately leads to a higher likelihood of funding success (Dorfleitner et al., 2016 ; Horvát et al., 2018 ).

From a theoretical point of view, the results of the various models presented in this study show the importance of textual description in crowdfunding campaigns of projects in specialized industries, such as cybersecurity and IoT, and the importance of sentiment in the campaign’s description to the success of a campaign. Although previous research examined the role of sentiment analysis in general crowdfunding campaigns and not in an industry-specific context, this study shows that sentiment is equally important in specialized projects that require investors to have specific knowledge to understand them. This study also demonstrates that previous findings on what drives crowdfunding success in general are also true for very specialized industries, such as cybertechnology and IoT.

From a practical standpoint, the results presented in this study provide further insights for both investors and entrepreneurs interested in investing in specialized projects through crowdfunding platforms. Campaigns need to pay particular attention to the tone of the text used to describe the projects, which should be positive to signal optimism and confidence to potential investors. For example, “Project X” is a cyber violence and governmental surveillance project that eventually failed, possibly partially due to its negative sentiment of − 0.62. Conversely, “Momo,” a successful project, was described as a smart home robot equipped with artificial intelligence that was designed as a super hub with standalone security features. The “Momo” project successfully achieved its funding goal, likely in part because of its positive sentiment of 0.9.

Longer project duration negatively and consistently affects the likelihood of success for a campaign. It may therefore be suggested not to use the full duration available on the platform. In our analysis, we found that IndieGoGo projects had a higher success rate than Kickstarter projects, perhaps due to self-selection bias, where projects with better prospects prefer to raise funds through this platform rather than Kickstarter.

Conclusions

This research investigated the sentiment and description characteristics topics effect on crowdfunding success in specific industries, the IoT and cybersecurity, for the first time. We found that the sentiment of the project description affects the success of crowdfunding campaigns for projects involving cybersecurity and IoT. According to these findings, entrepreneurs are encouraged to pay attention to the text they use to describe their projects, which should be positive, to signal optimism and confidence to potential investors.

In addition, this work demonstrated how the technology textual topics of campaigns that investors are less familiar negatively associated with crowdfunding project success. The findings of this work are expected to provide useful insights for entrepreneurs in the area of cybersecurity and IoT and help them achieve better results and higher success rates in their crowdfunding campaigns.

Limitation and future research

Future works can potentially analyze other platforms and projects in other languages. Considering that this study focused on two of the main crowdfunding platforms that operate in English, testing the hypotheses on platforms that operate in other languages could generalize the findings. Further exploration of the role of sentiments in the crowdfunding industry using, for example, sentiment analysis on comments made by potential backers. Despite the concerns expressed by scholars regarding the suitability of the crowdfunding industry for specialized projects, this work shows that it is possible for such projects to succeed on these platforms. However, we also argue that further investigation of the crowdfunding industry is necessary to unpack the differences between sectors and industries. General findings regarding what drives success in crowdfunding projects are not necessarily relevant for projects in all sectors, and especially when discussing technological projects that require investors to have specialized knowledge to understand them.

Additionally, future research can investigate the seemingly natural behavior of potential crowd investors who do not sufficiently understand a technological project, but are driven by gut feelings about its potential, and are therefore likely to invest less money than they would otherwise, thus resulting in a need for more backers to reach the funding target.

Data availability

Data will be made available upon reasonable request.

We implemented the OLS regression because only 9 observations out of 657 have zero values. Since the variable was not zero-inflated, no special treatment is required.

Agrawal, A., Catalini, C., & Goldfarb, A. (2014). Some simple economics of crowdfunding. Innovation Policy and the Economy, 14 (1), 63–97. https://doi.org/10.1086/674021

Article   Google Scholar  

Ahlers, G. K., Cumming, D., Günther, C., & Schweizer, D. (2015). Signaling in equity crowdfunding. Entrepreneurship Theory and Practice, 39 (4), 955–980. https://doi.org/10.1111/etap.12157

Akerlof, G. A. (1970). The market for “Lemons”: Quality uncertainty and the market mechanism. Quarterly Journal of Economics, 84 (3), 400–500. https://doi.org/10.1016/B978-0-12-214850-7.50022-X

Baum, J. A., & Silverman, B. S. (2004). Picking winners or building them? Alliance, intellectual, and human capital as selection criteria in venture financing and performance of biotechnology startups. Journal of Business Venturing, 19 (3), 411–436. https://doi.org/10.1016/S0883-9026(03)00038-7

Belleflamme, P., Lambert, T., & Schwienbacher, A. (2014). Crowdfunding: Tapping the right crowd. Journal of Business Venturing, 29 (5), 585–609. https://doi.org/10.1016/j.jbusvent.2013.07.003

Best, J., Lambkin, A., Neiss, S., Raymond, S., & Swart, R. (2013). Crowdfunding’s potential for the developing world (p. 1). InfoDev.

Google Scholar  

Block, J., Hornuf, L., & Moritz, A. (2018). Which updates during an equity crowdfunding campaign increase crowd participation? Small Business Economics, 50 , 3–27. https://doi.org/10.1016/S0883-9026(03)00038-7 , https://doi.org/10.1016/j.jbusvent.2013.07.003 , https://doi.org/10.1007/s11187-017-9876-4

Burhan, M., Rehman, R. A., Khan, B., & Kim, B.-S. (2018). IoT elements, layered architectures and security issues: A comprehensive survey. Sensors, 18 (9), 2796. https://doi.org/10.3390/s18092796

Cameron, A. C., & Trivedi, P. K. (2013). Regression analysis of count data (Vol. 53). Cambridge University Press.

Campello, M., & Graham, J. R. (2013). Do stock prices influence corporate decisions? Evidence from the technology bubble. Journal of Financial Economics, 107 (1), 89–110. https://doi.org/10.1016/j.jfineco.2012.08.002

Chen, Q., Goldstein, I., & Jiang, W. (2007). Price informativeness and investment sensitivity to stock price. The Review of Financial Studies, 20 (3), 619–650. https://doi.org/10.1093/rfs/hhl024

Cohen, J. (2013). Statistical power analysis for the behavioral sciences . Academic Press. https://doi.org/10.4324/9780203771587

Book   Google Scholar  

Courtney, C., Dutta, S., & Li, Y. (2017). Resolving information asymmetry: Signaling, endorsement, and crowdfunding success. Entrepreneurship Theory and Practice, 41 (2), 265–290. https://doi.org/10.1111/etap.12267

Craigen, D., Diakun-Thibault, N., & Purse, R. (2014). Defining Cybersecurity. Technology Innovation Management Review, 4 (10), 13–21.

Cumming, D. J., Leboeuf, G., & Schwienbacher, A. (2017). Crowdfunding cleantech. Energy Economics, 65 , 292–303. https://doi.org/10.1016/j.eneco.2017.04.030

Dorfleitner, G., Priberny, C., Schuster, S., Stoiber, J., Weber, M., de Castro, I., & Kammler, J. (2016). Description-text related soft information in peer-to-peer lending–Evidence from two leading European platforms. Journal of Banking & Finance, 64 , 169–187. https://doi.org/10.1016/j.jbankfin.2015.11.009

Dority, B., Borchers, S. J., & Hayes, S. K. (2021). Equity crowdfunding: US Title II offerings using sentiment analysis. Studies in Economics and Finance, 38 (4), 807–835. https://doi.org/10.1108/SEF-04-2020-0097

Drori, N., Alessandri, T., Bart, Y., & Herstein, R. (2024). The impact of digitalization on internationalization from an internalization theory lens. Long Range Planning, 57 (1), 102395. https://doi.org/10.1016/j.lrp.2023.102395

Efrat, K., & Gilboa, S. (2020). Relationship approach to crowdfunding: How creators and supporters interaction enhances projects’ success. Electronic Markets, 30 (4), 899–911. https://doi.org/10.1007/s12525-019-00391-6

Fortune Business Insights. (2023). The global cyber security market size is projected to grow from $172.32 billion in 2023 to $424.97 billion in 2030, at a CAGR of 13.8% . Retrieved January 15, 2024 from  https://www.fortunebusinessinsights.com/industry-reports/cyber-security-market-101165  

Gamble, J. R., Brennan, M., & McAdam, R. (2017). A rewarding experience? Exploring how crowdfunding is affecting music industry business models. Journal of Business Research, 70 , 25–36. https://doi.org/10.1016/j.jbusres.2016.07.009

Gao, Q., Lin, M. (2015). Center for Analytical Finance University of California, Santa Cruz.

Goodall, J. R., Lutters, W. G., & Komlodi, A. (2009). Developing expertise for network intrusion detection. Information Technology & People, 22 (2), 92–108. https://doi.org/10.1108/09593840910962186

Gortney, W. E. (2016). Department of defense dictionary of military and associated terms . Joint Chiefs of Staff Washington United States.

Hakenes, H., & Schlegel, F. (2014). Exploiting the financial wisdom of the crowd--Crowdfunding as a tool to aggregate vague information . Available at SSRN 2475025. https://doi.org/10.2139/ssrn.2475025

Härkönen, J. (2014). Crowdfunding and its utilization for startup finance in Finland–Factors of a successful campaign.

Horvát, E.-Á., Wachs, J., Wang, R., & Hannák, A. (2018). The role of novelty in securing investors for equity crowdfunding campaigns. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing  (Vol. 6, pp. 50–59). https://doi.org/10.1609/hcomp.v6i1.13336

Hutto, C., & Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text . Proceedings of the international AAAI conference on web and social media  (Vol. 8, No. 1, pp. 216–225). https://doi.org/10.1609/icwsm.v8i1.14550

International Telecommunication Union. (2009).  Definition of cybersecurity . Retrieved September 20, 2023 from https://www.itu.int/en/ITU-T/studygroups/com17/Pages/cybersecurity.aspx

ITU. (2009). Overview of cybersecurity. Recommendation ITU-T X. 1205. International Telecommunication Union (ITU) Geneva.

Jensen, L. S., & Özkil, A. G. (2018). Identifying challenges in crowdfunded product development: A review of Kickstarter projects. Design Science, 4 (e18), Article e18. https://doi.org/10.1017/dsj.2018.14

Jolliffe, I. T. (2002). Principal component analysis for special types of data . Springer. https://doi.org/10.1007/0-387-22440-8_13

Koning, R., & Model, J. (2013). Experimental study of crowdfunding cascades: When nothing is better than something . Available at SSRN 2308161. https://doi.org/10.2139/ssrn.2308161

Kremer, S., & Nautz, D. (2013). Short-term herding of institutional traders: New evidence from the German stock market.  European Financial Management, 19 (4), 730–746. https://doi.org/10.1111/j.1468-036X.2011.00607.x

Kumar, S., Tiwari, P., & Zymbler, M. (2019). Internet of Things is a revolutionary approach for future technology enhancement: A review. Journal of Big Data, 6 (1), 111. https://doi.org/10.1186/s40537-019-0268-2

Lambert, T., & Schwienbacher, A. (2010). An empirical analysis of crowdfunding. Social Science Research Network, 1578175 (1), 23.

Lewis, J. A., & Crumpler, W. (2019). The Cybersecurity Workforce Gap. Center for Strategic & International Studies . Retrieved March 18, 2024 from https://www.csis.org/analysis/cybersecurity-workforce-gap

Li, L., Yang, L., Zhao, M., Liao, M., & Cao, Y. (2022). Exploring the success determinants of crowdfunding for cultural and creative projects: An empirical study based on signal theory. Technology in Society, 70 , 102036. https://doi.org/10.1016/j.techsoc.2022.102036

Maughan, D., Balenson, D., Lindqvist, U., & Tudor, Z. (2013). Crossing the “Valley of Death”: Transitioning cybersecurity research into practice. IEEE Security & Privacy, 11 (2), 14–23. https://doi.org/10.1016/j.techsoc.2022.102036

Maughan, D., Balenson, D., Lindqvist, U., & Tudor, Z. (2015). Government-funded R&D to drive cybersecurity technologies. IT Professional, 17 (4), 62–65. https://doi.org/10.1109/MITP.2015.70

Mochkabadi, K., & Volkmann, C. K. (2020). Equity crowdfunding: A systematic review of the literature. Small Business Economics, 54 , 75–118. https://doi.org/10.1007/s11187-018-0081-x

Mollick, E. (2014). The dynamics of crowdfunding: An exploratory study. Journal of Business Venturing, 29 (1), 1–16. https://doi.org/10.1016/j.jbusvent.2013.06.005

Molling, G., & Zanela Klein, A. (2022). Value proposition of IoT-based products and services: A framework proposal. Electronic Markets, 32 (2), 899–926. https://doi.org/10.1007/s12525-022-00548-w

Moore, T. (2010). The economics of cybersecurity: Principles and policy options. International Journal of Critical Infrastructure Protection, 3 (3), 103–117. https://doi.org/10.1016/j.ijcip.2010.10.002

Ngoc, P. T., & Yoo, M. (2014). The lexicon-based sentiment analysis for fan page ranking in Facebook. The International Conference on Information Networking 2014 (ICOIN2014). https://doi.org/10.1109/ICOIN.2014.6799721

Nowak, A., Ross, A., & Yencha, C. (2018). Small business borrowing and peer-to-peer lending: Evidence from lending club. Contemporary Economic Policy, 36 (2), 318–336. https://doi.org/10.1111/coep.12252

Peng, Y., Li, Y., & Wei, L. (2022). Positive sentiment and the donation amount: Social norms in crowdfunding donations during the COVID-19 pandemic. Frontiers in Psychology, 13 , 818510. https://doi.org/10.3389/fpsyg.2022.818510

Rossi, M. (2014). The new ways to raise capital: An exploratory study of crowdfunding. International Journal of Financial Research, 5 (2), 8.

Schwienbacher, A., & Larralde, B. (2010). Crowdfunding of small entrepreneurial ventures. Handbook of entrepreneurial finance . Oxford University Press. https://doi.org/10.2139/ssrn.1699183 Forthcoming.

Shane, S., & Stuart, T. (2002). Organizational endowments and the performance of university start-ups. Management Science, 48 (1), 154–170. https://doi.org/10.1287/mnsc.48.1.154.14280

Sheskin, D. J. (2003). Handbook of parametric and nonparametric statistical procedures . Chapman and hall/CRC. https://doi.org/10.1201/9781420036268

Song, Y., Berger, R., Yosipof, A., & Barnes, B. R. (2019). Mining and investigating the factors influencing crowdfunding success. Technological Forecasting and Social Change, 148 , 119723. https://doi.org/10.1016/j.techfore.2019.119723

Sorri, K., Mustafee, N., & Seppänen, M. (2022). Revisiting IoT definitions: A framework towards comprehensive use. Technological Forecasting and Social Change, 179 , 121623. https://doi.org/10.1016/j.techfore.2022.121623

Spence, M. (1978). Job market signaling. Uncertainty in economics (pp. 281–306). Elsevier. https://doi.org/10.1016/B978-0-12-214850-7.50025-5

Chapter   Google Scholar  

Uparna, J., & Bingham, C. (2022). Breaking “Bad”: Negativity’s benefit for entrepreneurial funding. Journal of Business Research, 139 , 1353–1365. https://doi.org/10.1016/j.jbusres.2021.07.005

Ventures, C. (2017). Cybersecurity jobs report. Herjavec Group, 1.

Viktora-Jones, M., Parente, R., Drori, N., & Zhao, Y. (2024). Firm performance drivers within a dynamic emerging market ecosystem. Journal of International Management, 30 (1), 101119. https://doi.org/10.1016/j.intman.2023.101119

Wang, W., Zhu, K., Wang, H., & Wu, Y.-C.J. (2017). The impact of sentiment orientations on successful crowdfunding campaigns through text analytics. IET Software, 11 (5), 229–238. https://doi.org/10.1049/iet-sen.2016.0295

Wang, N., Li, Q., Liang, H., Ye, T., & Ge, S. (2018). Understanding the importance of interaction between creators and backers in crowdfunding success. Electronic Commerce Research and Applications, 27 , 106–117. https://doi.org/10.1016/j.elerap.2017.12.004

Wang, J., Luo, J., & Zhang, X. (2022a). How COVID-19 has changed crowdfunding: Evidence from GoFundMe. Frontiers in Computer Science, 4 , 893338. https://doi.org/10.3389/fcomp.2022.893338

Wang, W., Xu, Y., Wu, Y. J., & Goh, M. (2022b). Linguistic information distortion on investment decision-making in the crowdfunding market. Management Decision, 60 (3), 648–672. https://doi.org/10.1108/MD-09-2020-1203

Wilner, A. S. (2018). Cybersecurity and its discontents: Artificial intelligence, the Internet of Things, and digital misinformation. International Journal, 73 (2), 308–316. https://doi.org/10.1177/0020702018782496

World Economic Forum. (2012). Partnering for cyber resilience: Risk and responsibility in a hyperconnected world—principles and guidelines (p. 16). WEF.

Wu, Y., Ye, H., Jensen, M. L., & Liu, L. (2024). Impact of project updates and their social endorsement in online medical crowdfunding. Journal of Management Information Systems, 41 (1), 73–110. https://doi.org/10.1080/07421222.2023.2301173

Xu, L., & Drori, N. (2023). Internationalization under attack: The external threat of short sellers. Multinational Business Review, 31 (3), 362–380. https://doi.org/10.1108/MBR-02-2022-0035

Yaqoob, I., Ahmed, E., Hashem, I. A. T., Ahmed, A. I. A., Gani, A., Imran, M., & Guizani, M. (2017). Internet of Things architecture: Recent advances, taxonomy, requirements, and open challenges. IEEE Wireless Communications, 24 (3), 10–16. https://doi.org/10.1109/MWC.2017.1600421

Ye, Q., & Fang, B. (2013). Learning from other buyers: The effect of purchase history records in online marketplaces. Decision Support Systems, 56 , 502–512. https://doi.org/10.1016/j.dss.2012.11.007

Zhang, H., & Chen, W. (2019). Crowdfunding technological innovations: Interaction between consumer benefits and rewards. Technovation, 84 , 11–20. https://doi.org/10.1016/j.technovation.2018.05.001

Zhang, X., Tao, X., Ji, B., Wang, R., & Sörensen, S. (2023). The success of cancer crowdfunding campaigns: Project and text analysis. Journal of Medical Internet Research, 25 , e44197. https://doi.org/10.2196/44197

Zheng, H., Li, D., Wu, J., & Xu, Y. (2014). The role of multidimensional social capital in crowdfunding: A comparative study in China and US. Information & Management, 51 (4), 488–496. https://doi.org/10.1016/j.im.2014.03.003

Zhou, M., Lu, B., Fan, W., & Wang, G. A. (2018). Project description and crowdfunding success: An exploratory study. Information Systems Frontiers, 20 , 259–274. https://doi.org/10.1007/s10796-016-9723-1

Zhu, L., Li, M., & Metawa, N. (2021). Financial risk evaluation Z-score model for intelligent IoT-based enterprises. Information Processing & Management, 58 (6), 102692. https://doi.org/10.1016/j.ipm.2021.102692

Download references

Open access funding provided by International Institute for Applied Systems Analysis (IIASA).

Author information

Authors and affiliations.

Faculty of Information Systems and Computer Science, College of Law and Business, 26 Ben-Gurion St., Ramat-Gan, Israel

Abraham Yosipof & Or Elroy

International Institute for Applied Systems Analysis, Laxenburg, Austria

Abraham Yosipof

D’Amore-McKim School of Business, Northeastern University, Boston, MA, USA

Netanel Drori

Department of Computer Science, University of Oregon, Eugene, OR, USA

Faculty of Economics and Social Science, Universitat Internacional de Catalunya, Carrer de La Immaculada, 22, 08017, Barcelona, Spain

Yannis Pierraki

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Abraham Yosipof .

Ethics declarations

Conflict of interests.

The authors declare no competing interests.

Additional information

Responsible Editor: Samuel Fosso Wamba

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Yosipof, A., Drori, N., Elroy, O. et al. Textual sentiment analysis and description characteristics in crowdfunding success: The case of cybersecurity and IoT industries. Electron Markets 34 , 30 (2024). https://doi.org/10.1007/s12525-024-00712-4

Download citation

Received : 18 April 2023

Accepted : 12 April 2024

Published : 29 April 2024

DOI : https://doi.org/10.1007/s12525-024-00712-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Crowdfunding
  • Cybersecurity
  • Internet of Things
  • Sentiment analysis

JEL Classification

  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. (PDF) A Study of Sentiment Analysis: Concepts, Techniques, and Challenges

    research paper on sentiment analysis

  2. (PDF) A Survey on Sentiment Analysis

    research paper on sentiment analysis

  3. NLP-Sentiment-Analysis-

    research paper on sentiment analysis

  4. (PDF) The Evolution of Sentiment Analysis

    research paper on sentiment analysis

  5. (PDF) A Review on Sentiment Analysis Approaches

    research paper on sentiment analysis

  6. (PDF) Sentiment Analysis at Document Level

    research paper on sentiment analysis

VIDEO

  1. How to review machine learning paper, Sentiment Analysis (Af Soomaali)

  2. Sentiment Analysis

  3. Explore sentiment analysis for your data science portfolio using AWS, Google, and Microsoft tools

  4. Review Teknikal Paper Yang Berjudul " Twitter sentiment analysis from Iran about COVID 19 vaccine "

  5. Positive/Negative Card

  6. How to conduct sentiment Analysis?

COMMENTS

  1. A review on sentiment analysis and emotion detection from text

    3.1 Datasets for sentiment analysis and emotion detection. Table 2 lists numerous sentiment and emotion analysis datasets that researchers have used to assess the effectiveness of their models. The most common datasets are SemEval, Stanford sentiment treebank (SST), international survey of emotional antecedents and reactions (ISEAR) in the field of sentiment and emotion analysis.

  2. A systematic review of social media-based sentiment analysis: Emerging

    2.1. The identification of research questions. Sentiment analysis techniques have been shown to enable individuals, organizations and governments to benefit from the wealth of meaningful information contained in the unstructured data of social media, and there has been a great deal of research devoted to the design of high-performance sentiment classifiers and their applications [1], [4], [5 ...

  3. Sentiment Analysis

    Sentiment Analysis. 1293 papers with code • 39 benchmarks • 93 datasets. Sentiment Analysis is the task of classifying the polarity of a given text. For instance, a text-based tweet can be categorized into either "positive", "negative", or "neutral". Given the text and accompanying labels, a model can be trained to predict the correct ...

  4. A Survey of Sentiment Analysis: Approaches, Datasets, and Future Research

    Given the importance of sentiment analysis, this paper provides valuable insights into the current state of the field and serves as a valuable resource for both researchers and practitioners. The information presented in this paper can inform stakeholders about the latest advancements in sentiment analysis and guide future research in the field.

  5. More than a Feeling: Accuracy and Application of Sentiment Analysis

    This makes accuracy, i.e., the share of correct sentiment predictions out of all predictions, also known as hit rate, a critical concern for sentiment research. Hartmann et al. (2019) were among the first to conduct a systematic comparison of the accuracy of sentiment analysis methods for marketing applications.

  6. A survey on sentiment analysis methods, applications, and challenges

    The rapid growth of Internet-based applications, such as social media platforms and blogs, has resulted in comments and reviews concerning day-to-day activities. Sentiment analysis is the process of gathering and analyzing people's opinions, thoughts, and impressions regarding various topics, products, subjects, and services. People's opinions can be beneficial to corporations, governments ...

  7. A comprehensive survey on sentiment analysis ...

    Since 2004, sentiment analysis has become the fastest growing and the most active research area, as there has been a massive increase in the number of papers focusing on sentiment analysis and opinion mining recently [18]. Fig. 1 shows the rising popularity of sentiment analysis according to Google Trends.

  8. [2311.11250] A Comprehensive Review on Sentiment Analysis: Tasks

    It plays a vital role in analyzing big data generated daily in structured and unstructured formats over the internet. This survey paper defines sentiment and its recent research and development in different domains, including voice, images, videos, and text. The challenges and opportunities of sentiment analysis are also discussed in the paper.

  9. Sentiment analysis using product review data

    Sentiment analysis or opinion mining is one of the major tasks of NLP (Natural Language Processing). Sentiment analysis has gain much attention in recent years. In this paper, we aim to tackle the problem of sentiment polarity categorization, which is one of the fundamental problems of sentiment analysis. A general process for sentiment polarity categorization is proposed with detailed process ...

  10. Systematic reviews in sentiment analysis: a tertiary study

    With advanced digitalisation, we can observe a massive increase of user-generated content on the web that provides opinions of people on different subjects. Sentiment analysis is the computational study of analysing people's feelings and opinions for an entity. The field of sentiment analysis has been the topic of extensive research in the past decades. In this paper, we present the results of ...

  11. Sentiment Dimensions and Intentions in Scientific Analysis ...

    Sentiment Analysis in text, especially text containing scientific citations, is an emerging research field with important applications in the research community. This review explores the field of sentiment analysis by focusing on the interpretation of citations, presenting a detailed description of techniques and methods ranging from lexicon-based approaches to Machine and Deep Learning models.

  12. (PDF) Sentiment Analysis

    In this paper, we combine the requirements of two subtasks to propose a new aspect-based sentiment analysis framework based on span, which is a simple and effective joint model to generate all ...

  13. (PDF) Sentiment Analysis Using Deep Learning

    Sentiment analysis on T witter uses several approaches. where Deep learning have gained great results in emotion. recognition. This paper focus on classifying user emotion. in T witter messages ...

  14. Exploring Sentiment Analysis Techniques in Natural Language Processing

    Sentiment analysis is the process of recognizing and extracting subjective information from textual data. It includes analyzing opinions, attitudes, emotions, and feelings articulated in a text and categorizing them as positive, negative, or neutral sentences [1]. SA has gained a lot of popularity in recent years due to the abundance of user ...

  15. Sentiment Analysis in Social Media and Its Application: Systematic

    Abstract. This paper is a report of a review on sentiment analysis in social media that explored the methods, social media platform used and its application. Social media contain a large amount of raw data that has been uploaded by users in the form of text, videos, photos and audio. The data can be converted into valuable information by using ...

  16. (PDF) A Review On Sentiment Analysis Methodologies ...

    The Sentiment Analysis is sometimes a technique to look at the information that is the form of text and determine opinions content from the text. It is also termed as emotion or feeling mining. On ...

  17. Survey on sentiment analysis: evolution of research methods and topics

    Sentiment analysis, one of the research hotspots in the natural language processing field, has attracted the attention of researchers, and research papers on the field are increasingly published. Many literature reviews on sentiment analysis involving techniques, methods, and applications have been produced using different survey methodologies and tools, but there has not been a survey ...

  18. A Survey on Sentiment Analysis

    The practical results declared in this paper are from the implantation of sentiment analysis on the IMDB movie reviews dataset. Evaluation metrics such as accuracy, precision, recall, and f1-score are used. This Research-based survey has been divided into different sections, each section concerning the stepwise process of sentiment analysis.

  19. Sentiment Analysis: A Comparative Study on Different Approaches

    Feature Driven SA x Adaptable to large projects. x It is a concise process. x Not a powerful on smaller projects. 5. Conclusion Various sentiment analysis methods and its different levels of analysing sentiments have been studied in this paper. Our ultimate aim is to come up with Sentiment Analysis which will efficiently categorize various reviews.

  20. (PDF) A Study of Sentiment Analysis: Concepts ...

    A Study of Sentiment Analysis: Concepts, T echniques, and Challenges. Ameen Abdullah Qaid Aqlan, B. Manjula and R. Lakshman Naik. Abstract Sentiment analysis (SA) is a process of extensive ...

  21. Sentiment analysis: A survey on design framework ...

    Sentiment analysis is a solution that enables the extraction of a summarized opinion or minute sentimental details regarding any topic or context from a voluminous source of data. Even though several research papers address various sentiment analysis methods, implementations, and algorithms, a paper that includes a thorough analysis of the process for developing an efficient sentiment analysis ...

  22. Opinion

    Sources: Demetrio Guzzardi, et al., Journal of the European Economic Association; Emmanuel Saez and Gabriel Zucman; Institut des Politiques Publiques; Netherlands Bureau for Economic Policy Analysis

  23. The evolution of sentiment analysis—A review of research topics, venues

    Consequently, 99% of the papers have been published after 2004. Sentiment analysis papers are scattered to multiple publication venues, and the combined number of papers in the top-15 venues only represent ca. 30% of the papers in total. We present the top-20 cited papers from Google Scholar and Scopus and a taxonomy of research topics.

  24. Religious Landscape Study

    About Pew Research Center Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions.

  25. SENTIMENT ANALYSIS USING NATURAL LANGUAGE PROCESSING AND ...

    SENTIMENT ANALYSIS USING NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING. April 2023. Shu Ju Cai Ji Yu Chu Li/Journal of Data Acquisition and Processing 38 (2):520-526. DOI: 10.5281/zenodo ...

  26. 301 Moved Permanently

    301 Moved Permanently

  27. Joint sentiment analysis of lyrics and audio in music

    Sentiment or mood can express themselves on various levels in music. In automatic analysis, the actual audio data is usually analyzed, but the lyrics can also play a crucial role in the perception of moods. We first evaluate various models for sentiment analysis based on lyrics and audio separately. The corresponding approaches already show satisfactory results, but they also exhibit ...

  28. Sentiment analysis algorithms and applications: A survey

    Sentiment Analysis (SA) is an ongoing field of research in text mining field. SA is the computational treatment of opinions, sentiments and subjectivity of text. ... be useful for new comer researchers in this field as it covers the most famous SA techniques and applications in one research paper. This survey uniquely gives a refined ...

  29. Textual sentiment analysis and description characteristics in

    The objective of this paper is to investigate how the sentiment of text in project campaigns, and project topic characteristics, influence crowdfunding success in innovative industries. ... The second and third sections review the academic literature related to sentiment analysis in crowdfunding research. The fourth section describes the ...