literature review of stock market pdf

Stock Market Integration: An International Perspective pp 6–28 Cite as

A Literature Review of the Stock Market Integration

Ekaterina Dorodnykh

201 Accesses

This chapter provides a detailed analysis of stock exchange integration in an international perspective, studying the relationships between various stakeholders of stock exchange industry. Indeed, the interests represented by the various stakeholders are mutually different and sometimes conflicting and that can even affect the integration decisions of stock exchange market. This part entirely summarizes the literature review on stock market integration and the main transition steps towards the complete fusion between stock markets. New evidence on the integrative models suggests that once the trend of integration is firmly established within a market, it is logical to expect a phase of experimentation, in which various integration projects could start. The complete review of all integrative models is presented, reconstructing the underlying logic of stock exchange integration.

This is a preview of subscription content, log in via an institution .

Buying options

Available as PDF
Read on any device
Instant download
Own it forever
Available as EPUB and PDF
Durable hardcover edition
Dispatched in 3 to 5 business days
Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Unable to display preview. Download preview PDF.

You can also search for this author in PubMed Google Scholar

Copyright information

About this chapter

Cite this chapter.

Dorodnykh, E. (2014). A Literature Review of the Stock Market Integration. In: Stock Market Integration: An International Perspective. Palgrave Pivot, London. https://doi.org/10.1057/9781137381705_2

Download citation

DOI : https://doi.org/10.1057/9781137381705_2

Publisher Name : Palgrave Pivot, London

Print ISBN : 978-1-349-47968-9

Online ISBN : 978-1-137-38170-5

eBook Packages : Palgrave Economics & Finance Collection Economics and Finance (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

We're Hiring!
Help Center

LITERATURE REVIEW OF STOCK MARKET

Related Papers

Donexbefree Abdul

This study is motivated primarily by the need to enhance capital accumulation from the stock market, being the long term end of the financial system. This study is an investigation of the impact of Nigeria stock exchange performance on the economic growth of Nigeria

Global Finance Journal

Hamid Mohtadi

bhushan pawar

The World Bank Economic Review

Asli Demirguc-kunt

It is now almost a regularity that a nexus exists between financial development and economic growth. Importantly, the preponderance of recent evidence lends support to a finance leading growth relationship. This kind of nexus is of particular importance to any country or region seeking economic growth, such as we find in Africa, where sustainable growth had been seemingly more elusive than in other continents. We therefore set out in this article to examine the current state of financial development in Africa, especially as regards its relevance as a mechanism for eliciting sustainable and/or inclusive economic growth. Sequel to flagging shortcomings of key financial markets – i.e., public equity markets, private equity markets, private debt markets, and public debt markets – across Africa, we proffer reasoned ways forward that can contribute to getting these markets to live up to their promise of supporting sustainable economic growth.

debasis pahi

This paper investigates the effects of financial markets development on the financing choice of firms in developing and developed Asian market economies. The panel data regression models were used for a mean total of 6506 non-financial listed companies during 1995-2016 for 12 Asian economics. The estimated econometric models included short-term, long-term and total debt-equity ratios as dependent variables which were regressed on financial markets development variables (such as banking sector development and stock market development indicators) along with macroeconomic variables (such as inflation, GDP growth, FDI and firm-specific variables (such as ratio of total assets to GDP, ratio of dividends to total assets and ratio of net sales to net fixed assets) as control variables. Also, financing choice of firms in developed and developing stock markets was estimated by splitting the sample into subsamples of developing and developed stock markets. The financial development indicators such as domestic credit to private sector by banks and stock market capitalization exhibited contrasting differences between the selected developing and developed Asian economies. The econometric results indicated that the banking sector and stock market development indicators significantly have opposite effects on the financing choice of the selected firms: banking variable is associated with a rise in the debt-equity ratio whereas stock market variable is associated with a fall in leverage ratio. The econometric effects of stock market development on firms financing choices in developing and developed stock markets showed a remarkable divergence. The evidence indicated that the estimated coefficient of the banking sector indicator in the developed stock market subsample was consistently negative for all the three leverage ratios whereas the coefficient in the developing stock market subsample was positive and significant. The important conclusion of the study is that though banking sector and stock market play different roles are however, complementary to each other suggesting that the policymakers should aim to develop banking sector and stock market simultaneously which will help firms to design their optimal financing choices.

IOSR Journals

Stijn Claessens

Daniela Klingebiel

The Brazilian equity market is characterized by relatively low liquidity, high cost of capital (low firm valuation), and limited new capital raising. Ownership concentration of corporations is high, with large wedges between control and cash flow rights, leading to large differences in pricing of non-voting and voting shares, reflecting the risks of expropriation by insiders. In recent years, much of

Canadian Institute for Knowledge Development (CIKD)

Marketing and Branding Research (MBR) , Akbar Jelodari

In order to implement the profitable projects, achieve the maximum efficiency, and increase their shareholders, companies may use different types of financial resources in different ways. The ability of companies to identify the internal and external resources for providing capital and financial programs is considered as one of the main factors that affects on the growth and development of the companies. Financing resources and their usage volume are factors that affect on the companies’ operating performance. In this regard, companies and economic institutions can be financially provided from both inside and outside. Companies can issue and sell new common stocks to investors to provide their required financial resources. This study was conducted to investigate the issuance of stocks as one of resources of financing in the companies. Statistical population of this study was active listed companies in Tehran Stock Exchange. The sample of study was selected among listed companies raising capital through applying stage random sampling and simple random sampling. Sampling was conducted during the period 2008-20013 and finally the sample size including 40 companies were chosen using a Cochran formula. To analyze the obtained information, t-test and correlation coefficient were used. Although the results of the study revealed that there was no significant difference between the internal sources of financing and the issuance of stocks among the studied companies, there is a significant relationship between companies’ issuance of stocks and their size. Companies increase the use of retained earnings and stocks for financing through expanding the size of companies. Also, due to the existence of relationship between financing and the fixed assets of the companies through issuing the stocks of companies, no statistically significant relationship were observed between financing and companies’ profitability. Finally, there was no significant relationship between financing and future stock returns through the issuance of stocks.

RELATED PAPERS

Lathifah Salsabila

ALFA: Revista de Linguística

Waldir Beividas

Rebecca Allinson

South African Journal of Geomatics

Kenneth Cheruiyot

Jiří Tobíšek

Revista Geográfica Da América Central

Revista Geográfica de América Central

evi sofia Sofia

ECA Sinergia

Johanna Naula

European Journal of Agronomy

hongtao xing

Australian Journal of …

Shamima Akhtar Sharmin

Computer Methods in Applied Mechanics and Engineering

Marko Knezevic

Human Reproduction

Raquel Blanes

Acta Horticulturae

Silvana Nicola

Revista Latino-Americana de Enfermagem

Cleonice Andrea

Biological Research

Patricio Zapata

Journal of Cancer Metastasis and Treatment

Virendra Bhandari

Chemischer Informationsdienst

Liliane Bokobza

tufhfgd hfdfsd

Asian journal of surgery

Zhenbang Liu

Journal of Contemporary Public Administration (JCPA)

I Wayan Astawa

Developments in marketing science: proceedings of the Academy of Marketing Science

Adel El-Ansary

Asian Journal of Business and Accounting

Neuroepidemiology

Ahmed Jusabani

Browse Econ Literature

Working papers
Software components
Book chapters
JEL classification

More features

Subscribe to new research

RePEc Biblio

Author registration.

Economics Virtual Seminar Calendar NEW!

Stock Markets: An Overview and A Literature Review

Author & abstract
64 References
1 Citations
Most related
Related works & more

Corrections

A., Rjumohan

Suggested Citation

Download full text from publisher, references listed on ideas.

Follow serials, authors, keywords & more

Public profiles for Economics researchers

Various research rankings in Economics

RePEc Genealogy

Who was a student of whom, using RePEc

Curated articles & papers on economics topics

Upload your paper to be listed on RePEc and IDEAS

New papers by email

Subscribe to new additions to RePEc

EconAcademics

Blog aggregator for economics research

Cases of plagiarism in Economics

About RePEc

Initiative for open bibliographies in Economics

News about RePEc

Questions about IDEAS and RePEc

RePEc volunteers

Participating archives

Publishers indexing in RePEc

Privacy statement

Found an error or omission?

Opportunities to help RePEc

Get papers listed

Have your research listed on RePEc

Open a RePEc archive

Have your institution's/publisher's output listed on RePEc

Get RePEc data

Use data assembled by RePEc

To read this content please select one of the options below:

Please note you do not have access to teaching notes, determinants of stock market development: a review of the literature.

Studies in Economics and Finance

ISSN : 1086-7376

Article publication date: 6 March 2017

This paper aims to provide a comprehensive review of the literature on the determinants of stock market development.

Design/methodology/approach

The paper divides the existing studies into the theoretical and empirical literature. Then, it analyses these studies in turn.

Based on the theoretical literature, the determinants of stock market development can be broadly classified into two groups: macroeconomic factors and institutional factors. The theory and the empirics predict different ways in which macroeconomic factors affect stock market development. The real income and its growth rate foster stock market development, while the banking sector, interest rate and private capital flows can foster or inhibit it. Inflation and exchange rates have adverse effects on stock market development. In terms of the institutional factors, the literature indicates that different legal origins and stock market integration can have a positive or negative impact on stock market development. In addition, factors such as legal protection of investors, corporate governance, financial liberalisation and trade openness contribute positively to the development of the stock market.

Research limitations/implications

From the survey, it is imperative that policies which aim at enhancing institutional quality, financial integration, real income growth, macroeconomic stability and capital inflows, among others, will certainly promote stock market development within and across countries. Although the empirical studies have incorporated a large set of variables in their models, the theoretical studies do not contain rich models of stock market development. It is understandable that a theoretical model which contains a large set of the determinants of stock market development may be difficult to solve. However, such a model seems very appealing and will provide a unification of the existing literature.

Originality/value

The originality of the paper lies in the fact that it is the first to undertake a survey of the determinants of stock market development in the literature. It is hoped that this paper will spur further theoretical and empirical research on the determinants of stock market development.

Determinants
Literature review
Stock market development

Acknowledgements

The authors thank the Editor, Prof. Niklas Wagner, and an anonymous referee for their helpful comments and suggestions which have improved this paper substantially. All remaining shortcomings are ours.

Ho, S.-Y. and Njindan Iyke, B. (2017), "Determinants of stock market development: a review of the literature", Studies in Economics and Finance , Vol. 34 No. 1, pp. 143-164. https://doi.org/10.1108/SEF-05-2016-0111

Emerald Publishing Limited

We’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
PeerJ Comput Sci

News sensitive stock market prediction: literature review and suggestions

Associated data.

The following information was supplied regarding data availability:

This is a survey article, therefore there is no raw data or code.

Stock market prediction is a challenging task as it requires deep insights for extraction of news events, analysis of historic data, and impact of news events on stock price trends. The challenge is further exacerbated due to the high volatility of stock price trends. However, a detailed overview that discusses the overall context of stock prediction is elusive in literature. To address this research gap, this paper presents a detailed survey. All key terms and phases of generic stock prediction methodology along with challenges, are described. A detailed literature review that covers data preprocessing techniques, feature extraction techniques, prediction techniques, and future directions is presented for news sensitive stock prediction. This work investigates the significance of using structured text features rather than unstructured and shallow text features. It also discusses the use of opinion extraction techniques. In addition, it emphasizes the use of domain knowledge with both approaches of textual feature extraction. Furthermore, it highlights the significance of deep neural network based prediction techniques to capture the hidden relationship between textual and numerical data. This survey is significant and novel as it elaborates a comprehensive framework for stock market prediction and highlights the strengths and weaknesses of existing approaches. It presents a wide range of open issues and research directions that are beneficial for the research community.

Introduction

Stock market trends are extremely volatile in nature that makes prediction quite hard. This volatile nature attracts researchers to investigate sophisticated techniques for better prediction. Prediction of stock market trends with high accuracy generates significant revenue. Fundamental and technical analyses are two basic approaches used for stock trend prediction. Technical analysis inspects past data and volumes of stock prices while fundamental analysis not only considers stock statistics but also evaluates industry’s performance, political events, and economic circumstances ( Patel et al., 2015 ; Milosevic, 2016 ). Fundamental analysis is more realistic because it evaluates the market in a broader scope. This survey puts emphasis on research work based on fundamental analysis, where textual data is considered along with stock price historical data for stock trend prediction.

There are many sources of textual data like news, tweets, and annual reports, etc. which could be analyzed to mine significant information. Textual data, especially news, is a better source of hidden information than numeric data because it permits to predict financial trends with its justification ( Chan & Franklin, 2011 ). For instance, a news article on a company with words or phrases like “resignation”, “risk of default” helps the investor to predict a decrease in the company’s stock prices. Furthermore, news about many uncertain factors can affect stock market trends ( Nassirtoussi et al., 2015 ). For instance, economic and political shocks, war, civil unrest, terrorism, and natural disasters etc. Therefore, there is a great need for better knowledge discovery mechanisms from textual data.

Feature extraction is a fundamental step in prediction where input data is reduced into more manageable form for further processing. Most of the previous work on news sensitive stock trend prediction adopted shallow features extraction techniques which are unstructured and where words are represented as features. For instance, Bag-of-Words (BoW), noun phrases, and named entities ( Schumaker & Chen, 2009 ). This is contrary to the structured feature extraction technique where a combination of words, nouns, and verbs are used. Unlike structured feature extraction techniques, shallow feature extraction techniques are not able to capture a complete event in the form of structured entity-relation information. Consequently, shallow features make it complicated to represent the impact of news events on stock market trend prediction ( Ding et al., 2014 ).

Events extracted from news articles may play a significant role in stock market trend prediction. Sophisticated Natural Language Processing (NLP) technologies enable more accurate structured representation of events than shallow features. But structured representation of events increases sparsity, which most probably decreases the predictive power ( Ding et al., 2015 ). This issue is solved by using event embedding, which are dense vectors. Event embedding is used to reduce sparsity due to structured representation of events by comprising syntactically or semantically similar events into similar vectors. But event embedding suffers from some limitations due to the lack of background knowledge ( Ding et al., 2016 ). For instance, two events with similar words may have similar embedding. However, they do not have any causal or logical relation. Integrating knowledge base in learning event embedding will result in better event embedding.

Shallow features and event based features represent facts about text while expressing subjective textual information is another way for text analysis. Sentiment analysis is a widely used approach to infer emotion from textual data that represents subjective information. It is a major area of interest in today’s text analytics. In news sensitive stock prediction, it is used to extract news polarity using machine learning and sentiment dictionary based approaches ( Li, Wu & Wang, 2020 ).

Features extracted from input are fed into machine learning algorithm for prediction. In existing literature, two types of machine learning techniques are used for stock trend prediction. Shallow learning is a machine learning technique, where composition layers are few like Support Vector Machine (SVM) and Artificial Neural Network (ANN). While deep learning technique contains many hidden layers like Convolutional Neural Network (CNN). The elegance of deep learning is to extract features and learn classification ( Pasupa & Sunhem, 2016 ; Vargas, De Lima & Evsukoff, 2017 ; Dutta, 2018 ; Long, Lu & Cui, 2019 ).

Considering the impact and potential of news on stock market performance, there is a significant need to analyze, assess, and evaluate techniques of news sensitive stock market prediction. This paper is motivated to address this need. In the context of news sensitive stock trend prediction, we have three questions to direct this research work: (i) What is the generic methodology to perform prediction? (ii) Which approaches have been used for text processing and how these existing approaches can be improved? (iii) Which machine learning algorithms have more potential to model the selected domain?

Based on the above research questions, an extensive literature review related to stock prediction is discussed and guidelines are suggested to forecast the influence of news events on the stock market. The main attainments of this paper are as follows:

Study of stock trend prediction based on financial time series data and preprocessing of financial data.
Research based on text preprocessing and feature extraction techniques.
Investigation about prediction algorithms to analyze the influence of textual and numerical features on the stock market.
Based on reviewed literature, solutions are identified that guide to resolve challenges found in the generic phases of a stock prediction task.
Discussion about open issues and research directions, which can be further investigated and explored by the community.

This survey is useful in understanding the needs, fundamentals, frameworks, and techniques for stock market trend analysis. This research work is organized in eight sections. Survey organization is shown in Fig. 1 and survey methodology is discussed in “Survey Methodology”. Generic methodology is discussed along with main challenges in “Generic Methodology for News Sensitive Stock Trend Prediction”. In “Key Concepts”, important terminologies and key concepts are discussed. Literature review is presented in “Literature review”. Furthermore, in “Discussion”, discussion on the reviewed literature techniques is presented along with opportunities suggested to tackle challenges identified in “Generic Methodology for News Sensitive Stock Trend Prediction”. “Open Issues and Research Directions” discusses the open issues and research directions and “Conclusions” concludes this review.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-07-490-g001.jpg

Survey methodology

This survey is based on research articles from leading journals and conferences. The main research questions that lead this survey: (i) What are the machine learning methods proposed in literature for stock prediction using financial and textual data? (ii) What are the techniques proposed in literature for preprocessing and feature extraction of financial and textual data? (iii) What are the key challenges, opportunities, and open issues in this domain?

“Stock prediction”, categorically is the seed key term for searching relevant literature in this research. Moreover, other search terms are combined using Boolean operator AND with seed term in order to make the searching criteria more specific. This research study emphasizes the domain of stock prediction using machine learning so the other relevant terms are “machine learning”, “artificial intelligence”, “artificial neural network”, “deep learning”, “stock price”, “news headlines”, “event extraction”, “text mining”, “sentiment analysis”, “sentiment lexicon”, and “time series analysis”.

Initially more than 100 papers are selected but not all are included. The main criterion to exclude some of the research papers are their research objective, which do not come in the scope of this study. To cover the domain background as well as state of the art techniques in a wide range, papers were selected from the year 1999 to 2021. Finally, 99 papers are included in this survey. Chart in Fig. 2 shows the number of selected research papers per year.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-07-490-g002.jpg

Generic methodology for news sensitive stock trend prediction

Figure 3 shows the stock prediction generic methodology along with its main phases. The presented generic methodology is adapted from Nassirtoussi et al. (2015) and Cavalcante et al. (2016) . It is divided into three phases where the first phase performs data collection. Stock market data can be downloaded from their official websites. Similarly, online news can be downloaded or scrapped from relevant websites. This data can be stored in different file formats for further processing. For example, a CSV (“comma-separated values”) is a simple file format used to store tabular data.

An external file that holds a picture, illustration, etc.
Object name is peerj-cs-07-490-g003.jpg

Second phase performs data preprocessing and feature extraction. In the context of news sensitive stock prediction, numerical and textual data is processed separately. Market data is preprocessed to remove inconsistencies and noise to achieve better prediction accuracy. It is also processed in order to select features and derive new features from existing one to align with textual features ( Chen et al., 2019 ; Picasso et al., 2019 ; Li, Wu & Wang, 2020 ; Liu et al., 2020 ).

Financial time series data encompasses dynamic and nonlinear historical data. This characteristic makes trend analysis task and prediction harder ( Cavalcante et al., 2016 ). While noise and outliers are imperfect observations exist due to some error or abnormal situations taking place during data collection. Noise can be caused by human error or machine error while outliers can be caused by experimental error. These observations originate inconsistency in the data set and may cause poor data modelling along with poor forecasting. It is an important challenge related to preprocessing of data ( Yang & Wu, 2006 ; Grané & Veiga, 2010 ; Cavalcante et al., 2016 ).

Text mining is a process to derive meaningful information from raw text. In text mining, text preprocessing is also required in order to remove garbage from text. Then features are extracted from text. In feature extraction, text is parsed to extract features that best reflect the text contents. In the next step, an optimal subset of features is selected that contains all the relevant information. Moreover, features are represented by transforming the selected features into machine readable format (e.g., document vectors) ( Hagenau, Liebmann & Neumann, 2013 ).

The main difficulty in text mining for extracting facts is unstructured form of data ( Cho, Wüthrich & Zhang, 1999 ). Text mining is still an emerging field and the problem of high dimensionality and ignorance of semantics are not tackled strongly in previous literature ( Nassirtoussi et al., 2015 ). Sophisticated techniques are required to extract useful patterns from online unstructured text as it contains massive information ( Sumathy & Chidambaram, 2013 ).

Alternatively, text mining can be performed using sentiment analysis. There are two main approaches to perform sentiment analysis: corpus based and sentiment dictionary or lexicon based ( Taboada et al., 2011 ). So there is a question that which approach is better to adopt in financial domain?

The processed form of textual and numerical features is aligned and given as input into machine learning algorithms in order to learn the market volatility. In the third phase of the generic methodology, machine learning algorithm models the input data and generates predictive signals. Then these predictive signals are used to evaluate prediction accuracy of the proposed approach. Prediction accuracy is estimated by means of machine learning accuracy measures ( Cavalcante et al., 2016 ).

There are complex relations between textual data and historic market data which can be influenced by hidden factors ( Ding et al., 2014 ). In order to capture the influence of textual data over stocks price history an efficient classifier is required.

There are different opportunities proposed in literature to cater the above identified challenges with their limitations and strengths. Before literature review, next section gives a clear understanding of domain under consideration for its readers.

Key concepts

This section briefly explains the domain of stock prediction using news analysis and defines the key concepts that provide the foundation for the whole discussion. Firstly, it introduces the domain and terms related to stock market prediction. Secondly, it discusses the stock quotes or financial data, derived features and their use. It thoroughly discusses the techniques that clean textual data and extract significant features in it. Then it discusses the prediction algorithms that exploit the combination of numerical and textual data for prediction. Finally, it discloses the most basic evaluation measures to assess the performance of prediction algorithms.

Stock market prediction related terms

Stock market is a place where corporations publicly trade their stocks to gain funding in order to expand their business. Stocks can be sold or purchased if their companies are listed in the stock market. For instance, in April 2020, there were 542 companies listed in Pakistan’s stock market known as Pakistan Stock Exchange (PSX): http://www.ksestocks.com/AboutPSX .

Stock is a common term, describing the aggregation of shares in a company. While share is a documented ownership issued by the company. This ownership can be any percentage like 10% or 20% ownership depending on the amount of shares an investor has. If a company makes profit, then all shareholders get share of this profit according to their percentage of ownership and their stock price goes up. In the same way if the company is in loss then all shareholders have share in loss as well and their stock price goes down ( Setty, Rangaswamy & Subramanya, 2010 ).

Time series data

Time series is a set of numerical data points collected at successive points at regular intervals of time. Mathematically, it can be defined as a set of vectors X(t), t = 0, 1, 2, 3…. where t denotes the time interval.

A time series consists of four components: Trend (T), Seasonal (S), Cyclic (C), and Irregular (I). Trend component (T) is an outcome of long term gradual movement in same direction like increase in prices or pollution. Seasonal component (S) shows short term movement of time series data influenced by seasonal factors like sale of heater in cold weather. Cyclic component (C) shows long term rises and falls that are not of fixed period. Irregular or random component (I) is a short term movement that is unpredictable caused by external factors like war, earthquake, flood, etc. All these four components are combined using additive or multiplicative model to form a time series ( Idrees, Alam & Agarwal, 2019 ).

Where X(t) is a time series observation in Eqs. (1) and (2) .

Techniques for time series prediction can be classified as statistical techniques and machine learning/artificial intelligence (ML/AI) techniques. Example of statistical techniques are Auto Regressive Integrated Moving Average ARIMA, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), singular value decomposition (SVD), Dynamic Mode Decomposition (DMD), etc. Examples of (ML/AI) techniques are Support Vector Machines (SVM), Neural Networks (NN), Deep Neural Networks (DL), etc.

Financial data as a time series

Stock prices are treated as time series data. Stock market provides stock quotes or stock price such as Open, Close, Low, High and Volume etc., along with stock symbol and transaction date. These basic quotes give information such as high and low prices of stock in a day or its change in the value.

These financial indicators can be used directly in prediction models as a dependent or independent variable. Such as Close price is used as a dependent variable or label in prediction models ( Lin, Yang & Song, 2009 ; Rustam & Kintandani, 2019 ). Moreover new features are also derived from the existing one such as gain in ( Garcia-Lopez, Batyrshin & Gelbukh, 2018 ; Mourelatos et al., 2018 ).

In Eq. (3) , Close_Price t-1 and Close_Price t are close prices for previous and current day.

Trend is another derived attribute that shows upward or downward movement of a stock’s price over time. Trend is derived using the formula:

In Eq. (4) , Close_Price diff is a difference between current and previous day close prices. If obtained difference is less than or equal to 0s it means stock is down or no change in stock and Trend is 0 . If difference is greater than 0, it means stock is up and Trend is 1 , stated in Eq. (5) ( Ding et al., 2015 ; Vargas, De Lima & Evsukoff, 2017 ; Liu et al., 2018 ).

On the other side technical indicators are the measures that facilitate stock prediction in order to identify the strength and direction of stock trends. In literature, technical indicators have been used with textual features ( Li, Wu & Wang, 2020 ). Basically, they are derived from stock price data. For instance, Moving Average (MA) is a technical indicator that identifies the direction of current price trend. Moving Average Convergence Divergence (MACD), Relative Strength Index (RSI), and Money Flow Index (MFI) are some of the examples of technical indicators used in literature. For description about technical indicators ( Gao & Chai, 2018 ; Alonso-Monsalve et al., 2020 ) are suggested.

Textual data

Before feeding textual data to any machine learning model, it should be preprocessed. Data should be cleansed to remove garbage before feature extraction so that it doesn’t produce garbage when fed into machine learning models.

Textual data cleansing

Data cleansing is a basic preprocessing step employed to remove unwanted text. There are many approaches involved in data cleansing, and different approaches lead to different results in the model training phase. Moreover, different kind of data (sound, image, speech, etc.) is cleansed by using different approaches. Some basic techniques related to textual data cleansing are discussed below:

Removing unwanted characters

Occasionally, data is scraped from a web page like news, and reviews etc. Scraped data may contain html tags, punctuation, and any character which is not a part of the language. These unwanted characters should be filtered out. In case of tweets, there are hash tags, URLs, mentions, and reserved words etc., that are removed to clean the tweets for further processing ( Symeonidis, Effrosynidis & Arampatzis, 2018 ).

Tokenization

Tokenization splits text into meaningful chunks and those chunks are called tokens. A token can be a word or a sentence. Tokenization provides the basic unit for further text processing steps. In Uysal & Gunal (2014) , Nassirtoussi et al. (2015) and Symeonidis, Effrosynidis & Arampatzis (2018) tokenization is applied to extract a list of separate words from text.

Lowercasing is a very common step of text cleansing. It converts whole text into lowercase so that the same words are merged that reduces text dimensionality. In Uysal & Gunal (2014) , lowercasing leads to improvement in accuracy.

Removing punctuation

There are some scenarios where punctuation adds extra meaning like tweets’ sentiment analysis. For instance, an exclamation mark may increase the intensity of positive or negative remarks. Hence, removing punctuation in that scenario might reduce the accuracy. For other scenarios, removing punctuation is a common preprocessing task where punctuation doesn’t add extra meaning ( Schumaker & Chen, 2009 ; Uysal & Gunal, 2014 ; Symeonidis, Effrosynidis & Arampatzis, 2018 ). For instance, text representation into Bag of Words (BoW) considers multiplicity of a word in text such as a sentence or a document. Moreover, the occurrence of words in a text is used as an input feature to train a classifier. So removing punctuation in case of BoW is significant.

Removing stop words

Stop words are considered irrelevant and needless to analyze in text before passing to machine learning algorithms. They are high frequency words but don’t contain significant information for examples articles, conjunctions, prepositions etc. This preprocessing step reduces the dimensionality of term space as well. In literature mostly, stop words are removed before further text processing like in Zhai, Hsu & Halgamuge (2007) , Guzman & Maalej (2014) , Uysal & Gunal (2014) and Symeonidis, Effrosynidis & Arampatzis (2018) .

Parts-of-Speech (POS) tagging

In POS tagging, each word of a sentence is assigned a label like noun, verb, adjective etc. In text preprocessing, the purpose of POS tagging is to identify and extract specific words that have some worth in different scenarios. Like for sentiment analysis, only nouns, verbs, and adverbs are extracted in Symeonidis, Effrosynidis & Arampatzis (2018) and similarly nouns, verbs, and adjectives are taken as input feature in Guzman & Maalej (2014) . Likewise, noun phrases are extracted for textual analysis in Schumaker & Chen (2009) . In Ding et al. (2015) , events are extracted from news headlines. Each word in a news headline is labeled as noun, verb, adjective, and adverb etc. Then noun, verb, and object are extracted to form an event representation. Then this event representation is used to predict stock movement using deep learning. In Zhai, Hsu & Halgamuge (2007) , words are replaced by their generalized concepts using WordNet, where words are grouped semantically. While POS tagging facilitates in disambiguating the word when assigned to WordNet.

Word normalization

In English language, words have different forms like wolf or wolves, talk or talks, and write or wrote. Word's different forms actually represent the common base form of a word because these forms are semantically similar. Word normalization merges all forms into a single base or root form of a word. It reduces the feature space by reducing different forms of a word into a single one. Stemming and lemmatization are two ways in NLP to perform word normalization. Stemming converts different derived forms of a word into its root form, also known as stem, by removing the endings of the words like ‘s’, ’ies’, ’ing’ etc. It is a crude way of word normalization performed by just defining some rules of chopping off some characters at the end of the word ( Uysal & Gunal, 2014 ; Singh & Gupta, 2017 ; Symeonidis, Effrosynidis & Arampatzis, 2018 ). It is a widely used method and mostly gets good results ( Mejova & Srinivasan, 2011 ). Lemmatization is another method for word normalization. It merges different forms of a word into a root form, also called lemma, by using morphological rules and vocabularies ( Singh & Gupta, 2017 ). Lemmatization is a comparatively more systematic and effective method than stemming ( Symeonidis, Effrosynidis & Arampatzis, 2018 ). In Guzman & Maalej (2014) , it is used to reduce features for sentiment analysis.

Textual data processing for feature extraction

There are several methods proposed to analyze textual data. The two common methods to mine information are based on objective and subjective information. Objective information extraction techniques deal with facts in the form of shallow features and structural or event representation. While the other promising technique is about subjective information that is sentiment analysis.

Shallow features

BoW, noun phrases, and name entities are examples of shallow or simple features. BoW is the most basic shallow feature based technique used for text mining based stock market prediction problems. In this technique, text is broken into words and each word is considered as a feature ( Luss & D’Aspremont, 2015 ; Garcia-Lopez, Batyrshin & Gelbukh, 2018 ). In its basic form, order and co-occurrence of words are not considered. N-gram is an adjacent sequence of n words. For instance, unigram where n = 1 and each word is considered as a feature is same as BoW ( Hagenau, Liebmann & Neumann, 2013 ). But for bigram and trigram, two and three contagious words are considered as an entity. Moreover, two other feature selection techniques namely, named entities and noun-phrases were explored. In noun-phrases, noun POS is identified by using lexicon. While syntactic rules facilitate recognizing noun phrases. In the later one, a pre-defined categorization scheme such as name of person, organization, location etc., is used in order to locate and classify named entities ( Schumaker & Chen, 2009 ; Nassirtoussi et al., 2014 ). Word embedding is another form of representing text vocabulary. It is a dense representation for text where words that have the same meanings have similar representations ( Garcia-Lopez, Batyrshin & Gelbukh, 2018 ).

An event is a specific type of knowledge that can be extracted from text and contains entity-relation information. Formally, an event is defined as E = (A,P,O,T), where P is the action, A is the actor that performs the action, O is the object on which the action was performed and T is the timestamp used for aligning textual data(news) with numerical data(stock price) ( Ding et al., 2014 ). For example, the structure representation of the event “19 February 2014—Facebook buys WhatsApp for $19 billion.” is represented as: “(Actor = Facebook, Action = buys, Object = WhatsApp, Time = 19 February 2014)” ( Ding et al., 2015 ).

Event extraction approaches

There are three approaches for event extraction namely data-driven which relies on large corpus of data, knowledge base which makes use of domain knowledge, and hybrid approach where both techniques are combined. All three approaches are listed in Table 1 . Data-driven approaches require large text corpus and are based on quantitative approaches to automate text processing, for instance linear algebra, probability theory and information theory. All approaches focus on determining statistical relation without considering semantics explicitly. Since data-driven approaches do not rely on domain knowledge, no expert knowledge is required ( Hogenboom et al., 2016 ).

Knowledge-driven event extraction methods use domain knowledge in the form of ontologies and patterns that state rules. There are two types of patterns used in knowledge-driven event extraction. The first one is a combination of lexical representation and syntactic information. While the second one is more expressive, and combines lexical representation with syntactic and semantics information. Semantics are usually added by means of ontologies. Due to the use of patterns, knowledge-driven approaches require less training data and results are interpretable and traceable ( Hogenboom et al., 2016 ).

The hybrid approach is a combination of data-driven and knowledge-driven approaches. When domain knowledge is not enough, statistical methods are integrated for compensation. While result interpretability is difficult due to less expressive patterns. So the combination of multiple techniques increase the complexity and require expertise ( Hogenboom et al., 2016 ).

Significance of event embedding

Structured representation of events increases sparsity, which most probably decreases the predictive power ( Ding et al., 2015 ). In literature, this issue was solved by using event embedding, which are dense vectors. Event embedding uses the principle that syntactically or semantically similar events should comprise similar vectors. They are used to reduce sparsity due to structured representations of events and capture both the syntactic and the semantic information among events.

Sentiment analysis

Sentiment analysis mines the class of emotion from text and assigns a score. The process of sentiment analysis uses NLP and machine learning based approaches. Initially, it breaks down the text into parts like document, paragraph, sentence, phrase, or a word according to required granularity. Then the process of sentiment analysis suggests appropriate sentiment class along with score for the part of text under consideration.

There are two approaches to perform sentiment analysis on textual data. First method uses machine learning algorithm on labeled text in order to create a trained model. Then the train model is used for unseen data to perform sentiment analysis. Second approach is based on rule based sentiment analysis. These rules are also known as sentiment lexicon and inferred by language experts. In this method, text is tokenized into words. Then some preprocessing steps like stop words removal and punctuation removal etc. are performed in order to cleanse the data. Filtered words are classified as positive, negative, or neutral class on the basis of their corresponding intensity measures ( Taboada et al., 2011 ).

Sentiment dictionary or lexicon are constructed using two approaches. In first approach, dictionaries are constructed manually by experts where they also suggest rules to analyze sentiments. For instance, Vader ( Hutto & Gilbert, 2014 ), Harvard IV (HIV4) ( http://www.wjh.harvard.edu/~inquirer/homecat.htm ), and Loughran and McDonald (LM) ( Loughran & McDonald, 2011 ) are sentiment dictionaries constructed using first approach. Second approach is semi-automated and constructs sentiment dictionary in two steps. In the first step it adopts the same approach but for small dataset where experts manually perform sentiment analysis. Then automated extension methods are applied to construct sentiment dictionary. SentiWordNet 3.0 ( Baccianella, Esuli & Sebastiani, 2010 ) and SenticNet 5 ( Cambria et al., 2018 ) are the examples of sentiment dictionaries constructed using semi-automated approach.

Sentiment dictionaries can be further categorized on the basis of their domain. Like there are general purpose sentiment dictionaries like Vader and HIV4 that can be used in any domain. Domain Specific sentiment dictionary is another type like LM sentiment dictionary which is constructed for financial domain ( Li, Wu & Wang, 2020 ).

Data analytics is a process to examine dataset in order to draw conclusions about the information they contain. There are commercial news analytic vendors like Thomson Reuters and RavenPack who convert the news text into sentiment scores. In Allen, McAleer & Singh (2017) and Deveikyte et al. (2020) news sentiment scores are developed by Thomson Reuters and RavenPack. These sentiment scores are used to study the relationship between financial news sentiment scores and stock prices data.

Machine learning algorithms

Machine learning enables system to automatically learn and improve itself from experience. The learning procedure extracts patterns from available data and for unseen data these patterns are used for making predictions. The learning procedure is divided into supervised and unsupervised learning. In supervised learning data used for learning is properly labelled or classified. While in unsupervised learning data is not labelled and it is explored to identify the hidden patterns ( Tabares-Soto et al., 2020 ). In this section some supervised learning techniques are discussed which are commonly used in literature for stock prediction using textual analysis.

Naive bayes (NB)

Naïve bayes is a probabilistic machine learning algorithm. It is based on Bayes theorem. It is a simple but powerful prediction algorithm used mostly in text classification tasks such as spam filtering. Classification is based on the joint probabilities of features and classes. It is called naïve because features passed into the model are independent to each other. That is changing the value of one feature, does not directly influence or change the value of any of the other features used in the algorithm ( Dada et al., 2019 ). In text mining where the number of features are large, naïve assumption simplifies learning and NB outperforms SVM and KNN ( Yang & Liu, 1999 ).

K-nearest neighbors (KNN)

It is one of the simplest supervised machine learning algorithms with no or very minimal training phase. It does not make any underlying assumptions about the distribution of data because it is non parametric. Classifications are made using training data by matching the test example with each and every training example. That is, a complete training dataset is used to predict class for every test example. This makes KNN inefficient in terms of time and memory. Class is assigned to test instance on the bases of K similar nearest training instances where the optimum value of K is mined through tuning ( Groth & Muntermann, 2011 ; Dada et al., 2019 ).

Support vector machines (SVM)

Support Vector Machines are supervised machine learning algorithms. The main notion behind SVM is about finding a hyperplane that best classifies the data points. There might be many candidate hyperplanes for data segregation but the best one has a maximum margin that is the maximum distance between data points and a hyperplane. Maximizing marginal distance enhances classification accuracy when test data points are classified. Another, promising feature of SVM is that they shift data points into higher dimensional space if they are not separable in present dimensional space. SVM are slow but provide higher accuracy than other prediction algorithms and this feature makes SVM a preferable choice for text categorization related applications ( Cho, Wüthrich & Zhang, 1999 ; Dadgar, Araghi & Farahani, 2016 ; Dada et al., 2019 ).

Artificial neural network (ANN)

Artificial Neural Network is a complicated but powerful machine learning technique that imitates the human brain functionality. ANN is a group of neurons, the basic processing unit, which are interconnected and communicate with one another through weighted connections. Neurons take inputs, multiply these inputs with their corresponding weight and add them to get a value, and finally pass this value to an activation function to produce an output. These neurons are arranged in a manner so that they form a layer. ANN consists of three layers namely input, hidden, and output. Each layer consists of at least one neuron. There might be more than one hidden layer. Neural networks are trained using large number of iterations. One iteration to train a neural network has two steps. In the first step, neural network receives input through the input layer and passes it to the hidden layer(s), then the output of the hidden layer(s) moves towards the output layer. Finally, output is generated by the output layer of the neural network. This is called feed-forward propagation. The estimated output is compared with actual output in order to calculate the prediction error. Second step is called backpropagation step. It starts by adopting gradient based method. This gradient based method is used to find optimized weight values in neural network. Gradient based method calculates derivative of error with respect to each weight using chain rule. Backpropagation step is completed when all weights are updated. ANN is a powerful machine learning algorithm used as a text classifier in literature ( Groth & Muntermann, 2011 ; Dada et al., 2019 ). In Fig. S1 , structure of ANN is illustrated.

Deep learning

Deep learning is a sub field of machine learning, based on ANN. It is a new emerging area, also known as Deep Neural Network (DNN). The promising feature of deep learning is that it can learn features from data directly using multiple non-linear hidden layers. Deep learning models are able to solve sophisticated problems and their effectiveness becomes more prominent as training data increases. These models exploit the computation power of modern Central Processing Unit (CPU) and Graphics Processing Unit (GPU) to facilitate heavy processing.

Deep learning models face a challenge of vanishing gradient which slows down model’s training and in worst case stops it. This problem arises in deep neural networks due to deep hidden layers and it makes difficult to learn weights of earlier layer. According to chain rule, derivatives are multiplied to each other from last layer to first layer in order to compute the derivative of initial layers. If derivatives are small values, multiplication makes it so small. In this situation weights are not updated effectively which increases inaccuracy in a model prediction ( Charniak, 2019 ; Hu, Zhao & Khushi, 2021 ).

Convolutional neural networks (CNNs)

CNN is a deep learning model inspired by human vision mechanism and used across different applications especially in image processing and video processing tasks. Later, CNNs are enhanced for text classification. For instance, in Dada et al. (2019) , CNN is successfully used for email spam filtering. The basic architecture of CNN is composed of several layers, for instance input layer, convolution layer, pooling layer, and fully connected layer. It is illustrated in Fig. S2 .

Input is fed through input layer in the form of three dimensional array. Convolution layers are used to extract features. While former convolution layer extract features more abstractly than later convolution layer. These layers have filters that are small matrices to perform convolution operation. Pooling layer reduces the spatial dimension of input that is why it is used after convolutional layers. Convolution and pooling layers reduce the input dimension which in turns speed up computation. Last layer is fully connected where each neuron is connected to the neurons in the previous layer. An important feature of CNN is weight sharing which makes it less complex than fully connected neural network ( Goodfellow et al., 2016 ; Charniak, 2019 ; Hu, Zhao & Khushi, 2021 ).

Recurrent neural network (RNN)

RNN architecture is used to learn from sequential data. ANN models are used for traditional predictive analysis is not suitable for sequential data because observations in sequential data is not independent to each other. Traditional neural network treats each input as an independent entity. RNN preserves previously processed observations using hidden state and uses it along next observation going to be processed. Information in RNN travels through loop which makes it possible to use same parameters and reduces the parameter complexity than other NN models. But RNN doesn’t support long-term memory and faces vanishing gradient problem ( Goodfellow et al., 2016 ; Charniak, 2019 ; Hu, Zhao & Khushi, 2021 ). The architecture of RNN is shown in Fig. S3 .

Long short term memory (LSTM)

LSTM is a variant of RNN. It is also designed to support sequential data processing. It tackles the shortcoming of RNN like vanishing gradient and no support for long-term memory by introducing its gating mechanism. It has three gates which are input gate, output gate, and forget gate. Three types of information are passed into LSTM cell, the current input, hidden state (short-term memory), and cell state (long-term memory). The forget gate decides which information in cell state should be kept or discarded. While the input gate is responsible for what new information should be stored in cell state. The output gate receives current input, previous hidden state, and newly computed cell state in order to generate new hidden state and output for current input observation in sequence ( Goodfellow et al., 2016 ; Charniak, 2019 ; Hu, Zhao & Khushi, 2021 ). Architecture of LSTM is depicted in Fig. S4 .

Evaluation measures

The performance of the classification algorithms is measured using accuracy, precision, recall, and F1-score ( Li, Wu & Wang, 2020 ). In order to understand these basic metrics, there are some key terms which are the primary building blocks of these metrics. For instance, a true positive (TP) is a number of correctly predicted positive classes by the prediction model. Similarly, a true negative (TN) is the number of correct predictions for a negative class. A false positive (FP) is known as the number of incorrect predictions of a positive class made by the prediction model. While a false negative (FN) is an outcome where the model creates incorrect prediction for the negative class ( Patel et al., 2015 ). By using these key terms, evaluation metrics can be defined as:

Accuracy is the measure of correctly predicted instances divided by total number of instances.

Precision is the ratio between correct predictions for a class and total number of predictions for a class.

Recall is the ratio between correct prediction of a class and total number of actual predictions of a class.

F1-Score is also known as F-measure. It performs well if the prediction algorithm is dealing with uneven class distribution. It is a weighted average of Precision and Recall measures.

The above measures formulated from Eq. (6) to Eq. (11) , are particularly used to evaluate performance of classification algorithms. On the other hand, there are other basic measures which are used to measure the performance of regression algorithms where predicted value is a continues value. Evaluation metrics for regression problems identify the difference between actual and predicted value ( Gao, Chai & Liu, 2017 ; Jin, Yang & Liu, 2019 ). Some of the basic evaluation measures for regression problem are discussed below:

Mean square error (MSE)

Mean square error is a mean squared difference between actual and predicted values identified by regression task.

Root mean square error (RMSE)

Root mean square error is a square root of mean squared difference between actual and predicted values.

Mean absolute error (MAE)

Mean absolute error is a mean of absolute difference between actual and predicted values.

n = number of observations

For further detail about evaluation measures for regression task ( Hyndman & Koehler, 2006 ) is suggested.

Literature review

This section briefly outlines the research on stock prediction techniques. It summarizes the techniques that only consider numerical financial data for stock prediction. Then it discusses the feature extraction from textual data. It also summarizes the prediction algorithms that exploit the combination of numerical and textual data for prediction.

Forecasting stock trend using time series data

Time series analysis for stock prediction using statistical approaches and machine learning-based approaches have been adopted for many years. Both approaches have their own limitations and strengths. Initially, researchers adopted statistical approaches like authors in ( Huang, Nakamori & Wang, 2005 ) identified the financial movement direction of the NIKKEI 225 index by using different techniques. Forecasting performance of Support Vector Machine (SVM) is better than Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA) and Elman Backpropagation Neural Networks (EBNN). Combination of all these forecasting models further improves forecasting accuracy. A major limitation of this work is that the authors have used weekly price movement. While daily price movement would have given more insight in perceiving price volatility.

Ou & Wang (2009) examined the Hang Seng index of Hong Kong stock market. Experiments are performed using ten data mining techniques to predict price movement of the Hang Seng index of Hong Kong stock market. These techniques include k-Nearest Neighbor (k-NN) classification, LDA, QDA, Naive Bayes (NB), Neural Network (NN), Logit model, Tree based classification, Bayesian classification with Gaussian process, SVM and Least Squares SVM (LS-SVM). It is observed that each of the algorithms has its own limitations. The prediction model accuracy depends on input representation, feature selection, and prediction algorithm. SVM and LS-SVM performed best among the other models. Their empirical data is detailed and gives the right direction to further enhance the strength of prediction by combining these techniques. This paper provides a useful baseline for further investigation to improve prediction performance.

Dynamic Mode Decomposition (DMD) is relatively a newer approach to analyze time series data. DMD takes advantage of coherent structures in time series. These coherent structures are called DMD modes and give more physical insight than curve fitting. The application of trading strategy based on DMD is proposed in literature by many researchers to identify evolutionary patterns from stock price data ( Mann & Kutz, 2016 ). The strength of DMD is demonstrated in Cui & Long (2016) for Chinese A-share stock market for modeling the market patterns well in sideways market. Hua et al. (2016) extract cyclic activity and price prediction from daily stock price data using DMD method. In Kuttichira et al. (2017) adopted DMD for short term stock price prediction. They used minute-wise data of National stock market for predicting the price in next few minutes. Prediction was performed for intra and inter sector companies and evaluated using mean absolute percentage error. Time series analysis using DMD based approach is useful because DMD is an effective and computationally efficient approach but it is lacking in improving prediction performance for unexpected changes like stock split.

Sun & Li (2012) proposed a new SVM ensemble method for Financial Distress Prediction (FDP). Their main contribution is the technique for the selection of SVM ensemble’s base classifiers on the basis of individual SVM classifier’s performance. It was proved by experimental outcomes that the SVM ensemble was significantly better than individual SVM classifier. They have relied on financial ratios and ignored the importance of market factors such as firm size, and volatility etc. in predicting financial distress. Moreover, they have used a small dataset of five years for this study. Huang, Huang & Han (2012) adapted Support Vector Regression (SVR) as a forecasting technique for financial time series data. Furthermore, SVR kernels are built using wavelet functions. Comparative analysis showed that it is better than standard kernel function. Their proposed technique improved prediction performance but a use of wavelet function as a de-noising technique along with SVM is missing. Comparative analysis between both uses of wavelet function will give more insight for better utilization in dealing with time series data.

Cheng & Wei (2014) proposed SVR with Empirical Mode Decomposition (EMD) to predict stock price. Then in order to show its performance, comparison with a simple SVR model and Autoregressive (AR) model is performed. EMD is an adaptive approach that can decompose nonlinear time series into intrinsic mode functions (IMF). It can suppress continuous noise. In Yujun, Yimei & Jianhua (2020) , EMD and its variant Ensemble EMD, which are better in dealing noise, are adopted to decompose a time series into subsequences. Then LSTM is used to learn and predict from these subsequences. Finally, subsequences prediction results were combined to obtain the estimation of original time series. This is a good attempt and they have taken the advantage of statistical and machine learning methods as a hybrid approach. Moreover, other preprocessing techniques can also be employed in comparison with EMD to better understand its strength. Furthermore, the use of technical indicators can enhance the prediction performance.

The financial time series data has characteristics of nonlinearities and discontinuities which add complexity in forecasting techniques. Artificial Neural Network (ANN) is getting popular because it can handle the complex characteristics of financial time series data. The Istanbul Stock Exchange stock prices volatility is forecasted using ANN and SVM prediction models. Comparisons show that ANN achieves better results as compared to SVM ( Kara, Boyacioglu & Baykan, 2011 ). However, the use of fundamental factors as input variables besides technical indicators are missing. ( Ticknor, 2013 ) used Bayesian Neural Network (BNN), a kind of ANN with tangent sigmoid function as a hidden layer transfer function to predict stock price. In this study, very limited number of technical indicators are used as a set of input features. While there is still a space for further investigation to analyze prediction performance on different combinations of technical indicators.

Liu & Wang (2012) concentrated on the prediction of price fluctuation in the stock market and proposed a three layered improved Legendre Neural Network (LNN) model. Further improvement can be made in random time strength function for different financial markets to increase the prediction accuracy. Shahpazov, Velev & Doukovska (2013) examined three prediction models to predict Bulgarian stock market. They used Multi-Layer Perceptron (MLP), Radial Basis Function (RBF) neural network, and General Regression Neural Network (GRNN). While the performance of GRNN was better than others. When dealing with neural networks, dataset size plays an important role in optimizing model’s performance. But the dataset incorporated by them is limited in size. Wang & Wang (2015) proposed Stochastic Time Effective Function Neural Network (STNN) for stock prediction. Principal Component Analysis (PCA) is used to identify principal components. Furthermore, in order to ensure PCA-STNN predictive performance, the model is compared with Back Propagation Neural Network (BPNN), PCA-BPNN and STNN. But the predictive performance is not satisfactory when there is a large fluctuation period in time series. Nelson, Pereira & De Oliveira (2017) adopted deep neural networks for the stock market prediction. Their work is based on LSTM which can be further enhanced by adopting different preprocessing techniques. Arratia & Sepúlveda (2019) adopted the CNN model. The financial time series data is converted into images and then passed into CNN which produced improved classification accuracy for stock market prediction. The work performed by them is encouraging. However, they have used cross validation for time series data according to the work of ( Bergmeir, Hyndman & Koo, 2018 ) although the time series under consideration was not artificial and stationary.

The above literature review is summarized in Table S1 . This review exposes the superiority of SVM and ANN over other machine learning techniques. It is deduced that statistical methods don't perform better time series analysis as compared to ML approaches like SVM and ANN. Furthermore, DL based techniques like CNN and LSTM tackle nonlinearities and volatilities better than shallow learning based ML techniques.

Stock trend prediction using textual data

Unstructured form of textual data makes it difficult for data mining techniques to mine information from text. Moreover, these text mining techniques can be further classified as fact mining and opinion mining techniques. In this section, literature regarding text processing techniques is discussed and divided into three groups: shallow feature based text processing, event extraction based text processing, and sentiment analysis.

Shallow feature based text processing

Cho, Wüthrich & Zhang (1999) proposed text processing techniques based on keyword record counting. They used a fixed set of news stories. In these news articles, they searched around 400 keywords which were provided by market experts. They only considered words statistics and ignored their semantics. They examined several text processing methods and their effectiveness in forecasting financial markets.

Schumaker & Chen (2009) analyzed financial news articles using shallow features based textual representation approaches like Bag-of-Words (BoW), noun phrases, named entities, and proper noun schemes. They input different feature types in SVM classification to study their impact and recognize the superiority of proper noun schemes as compared to other text processing techniques. But they used small dataset that is not enough for in depth analysis.

Dadgar, Araghi & Farahani (2016) proposed a news classification approach using Term Frequency–Inverse Document Frequency (TF-IDF) for feature extraction and SVM as classification scheme. They compared it with other classification methods and found desirable results using their proposed approach. However, text preprocessing phase can be improved further by using word normalization techniques.

Li et al. (2016) proposed a stock market prediction scheme using Extreme Learning Machine (ELM) for rapid forecasting. ELM is a type of feed forward neural network with one hidden layer. ELM doesn’t employ gradient based methods for parameter optimization. They have used a news filtering scheme but a situation where multiple news for a stock occur in the same time window are not handled properly and discarded from filtered news dataset.

Groth & Muntermann (2011) used BoW and applied NB, k-NN, SVM and NN to learn patterns from text. In order to generalize prediction model, size of the dataset was not enough. Hagenau, Liebmann & Neumann (2013) used SVM prediction model and input textual feature set along with technical data. They showed that prediction performance is enhanced by using their feature extraction techniques where extracted BoW feature set is refined using market feedback. However, text preprocessing steps are not extensive only stop word removal technique is employed in the text preprocessing step.

Zhai, Hsu & Halgamuge (2007) used two news dataset where one is for general market and the other is for specific stock. Textual features are extracted using BoW and then extracted words are replaced with their high level concept using WordNet. These concepts are weighted using TF-IDF scheme. Selected textual features are combined with technical indicators and passed into SVM model for predictions but experiments are performed on small dataset.

Luss & D’Aspremont (2015) utilized textual and numerical data for intraday price prediction. They employed Multiple Kernel Learning (MKL) for price prediction and BoW to extract text features. These features are selected using dictionary of word’s stem form that reduced dimensionality of feature vector. But dictionary contained small set of words that limited the prediction performance.

Vargas, De Lima & Evsukoff (2017) combined CNN with LSTM and used word and sentence embedding for stock prediction. They have compared their results with ( Ding et al., 2015 ) and showed that performance became slightly better for word and sentence embedding. They have used hybrid prediction model and technical indicators but preprocessing step for financial time series is missing. Furthermore, the comparative analysis of experimental results using CNN and CNN combined with LSTM, showed that features with enhance semantic significantly contribute in improving prediction accuracy.

In Garcia-Lopez, Batyrshin & Gelbukh (2018) , the relationship between tweets and stock trend is captured while BoW and word embedding are used as a textual representation. BoW vocabulary size is justified by analyzing results for different vocabulary size but Word2Vec embedding size parameter is not tested for different values. It is shown that word embedding outperformed BoW. Yun, Sim & Seok (2019) used titles of news articles in Korean language and extracted features using word embedding. Features were passed into the CNN model and it produced 53% accuracy. In this research, authors have not discussed about hyper parameter’s tuning. The discussed literature about shallow features is summarized in Table S2 .

Event extraction based text processing

Event extraction approaches are based on large data corpus and do not depend on domain expertise. These approaches are used in many domains like finance, security, and war etc. In data-driven event extraction, clustering is performed to group documents or sentences that refer to the same event ( Naughton, Kushmerick & Carthy, 2006 ; Tanev, Piskorski & Atkinson, 2008 ). Ding et al. (2014) and Ding et al. (2015) used data-driven event extraction based stock prediction techniques. Structured events are extracted using OpenIE technology. Extracted events are generalized using two ontologies (WordNet and VerbNet). Then linear prediction models are compared with non-linear prediction models for capturing the hidden relationship between financial events and stock trend. They found that events based feature set performed better than BoW based feature set using non-linear prediction models. But the sparseness of event based features set limited the prediction accuracy. In their next work, extracted events are further processed by using event embedding. These embedded events are used to produce textual features for CNN prediction technique. It is observed that event-embedding based document representation improves the prediction accuracy more than discrete event based document representation. The proposed event embedding technique is based on word embedding of the elements of an event. This event embedding technique cannot capture the relationship between two semantically same events if word vectors are not same. In Ding et al. (2016) , the issue is addressed by introducing background knowledge. But they should extend their work for financial domain related knowledge graph. Deng et al. (2019) employed the similar technique for event embedding using knowledge graph to refine event embedding. For prediction, Temporal Convolutional Network (TCN) is employed that outperformed other deep learning models especially for abrupt changes of stock trend. However, accuracy is not the only metric to show the worth of prediction model. The prediction model should also be analyzed for the time and memory it takes for training.

Event extraction based on expert knowledge with small data corpus improves search performance of information extraction technique ( Borsje, Hogenboom & Frasincar, 2010 ; Hogenboom et al., 2013 ). Chen et al. (2019) extracted fine grained events automatically using a finance event dictionary built by domain experts. They proposed a professional financial event dictionary that contains all main financial events along their roles, and trigger words. By using this dictionary, events are extracted along their semantics automatically. This fine grained event significantly improves the prediction performance. However, the shared effect of stock and news is distributed between parameters of prediction model. So, it is difficult to extract this learned effect in order to use it in other financial problems.

For pattern based event extraction, lexico-semantic patterns are better than lexico syntactic ( IJntema et al., 2012 ). Author proposed lexico-semantic pattern language that make use of patterns to identify semantics from text. It is preprocessed before pattern matching in text. They performed experiments for finance and the political domain and found results with good precision and recall. But the extracted events are not timestamped, which is a mandatory feature in other domains like financial domain especially for trend prediction.

In Nuij et al. (2013) a framework is proposed for automatic extraction of news events from news messages. Knowledge based event extraction is performed using ViewerPro tool. Then extracted events impact and technical indicators are used in stock trading strategies. These strategies take the form of rules that use technical indicators with news variables. Genetic programming is used to reveal intricate rules based on news-based signals and technical indicators. Extracted events are not analyzed thoroughly like understanding the relationship between multiple events occurring in same time window so that their collective impact can be inferred on stock trend.

Many knowledge based event extraction techniques require data-driven processing steps like initial clustering hence these approaches are combined with their pros and cons. Hybrid event extraction techniques are used in many domains like biomedical, politics, weather etc., ( Jungermann & Morik, 2008 ; Björne et al., 2010 ). However, hybrid approach increases complexity by utilizing multiple techniques and requires high expertise to deal with. Event extraction based text processing is summarized in Table S3 .

Sentiment analysis based text processing

Stock prediction using sentiment analysis is an attractive area of research as it gives deeper analysis of textual data. In Sehgal & Song (2007) , Yahoo financial message board is used as a source of textual data for predicting stock trend. They inferred public sentiments from web messages and proved its correlation with stock trend. Naïve bayes, decision tree, and bagging algorithms are used as prediction algorithm. They also added an important contribution in terms of trust value parameter. It is calculated using author’s past performance related to correct predictions. On the basis of trust value unreliable sentiments are filtered which further enhances prediction accuracy. However, they have only considered past day web sentiments while there are web sentiments that have long term correlation with stock value.

It is shown by experiments in Wu et al. (2012) that the use of sentiment analysis based features along with technical indicators enhances the prediction performance. They used pointwise mutual information (PMI) measurement to extract sentiment analysis based features. PMI measures strength of semantic association between words and seed words from positive and negative class. The proposed technique for features extraction from stock news captures effective features which enhances prediction performance. But technical analysis can be improved further by examining different combinations of technical indicators.

De Fortuny et al. (2014) proposed a stock prediction model with SVM as a classifier. The input set is comprised of technical indicators and textual features. Textual features are extracted using BoW and sentiment analysis for different scopes of textual data. However, sentiment analysis only considered adjectives in text. By performing multiple experiments, they have given strong prove that stock prediction model performed better than random guessing.

Initially, sentiment analysis is tackled as a standard document classification problem, but soon it was realized that it requires some established knowledge base in the form of rules and vocabulary. Besides machine learning based sentiment analysis, lexicon based sentiment analysis is considered as a key resource for sentiment analysis.

In the beginning, sentiment lexicon was created manually with small data size ( Loughran & McDonald, 2011 ; Hutto & Gilbert, 2014 ). There is a need to automate the creation process of sentiment lexicon in order to increase lexicon data size. SenticNet5 ( Cambria et al., 2018 ) contains around 100k concepts along with relationships between these concepts. They automatically discovered conceptual primitives from text and linked them to commonsense concepts and named entities using symbolic and sub-symbolic AI. Recent studies show that these large scale lexicons provide significant improvement in performance. In Jovanoski, Pachovski & Nakov (2016) , using mid-sized manually crafted lexicon as seeds, a large scale lexicon is built using bootstrapping. It is shown that mid-sized seeds lexicon produces high quality lexicon than using small-sized seed lexicon. By using Sentprop, a framework to induce sentiment polarity for specific domain, ( Kreutz & Daelemans, 2018 ) adapted a general purpose sentiment lexicon for use in one specific domain.

Sentiment lexicon is a powerful tool that provides sentiment analysis an establish foundation. In Deng et al. (2011) , SentiWordNet 3.0 was used as a sentiment lexicon to perform sentiment analysis for news and comments scrapped from social network. They performed technical analysis on stock price data for feature extraction. Finally, all types of features are input into multiple kernel learning regression model (MKL) to perform prediction. Evaluation results showed that multiple type of features enhanced prediction performance. For sequence learning, neural network based model like LSTM performs better than MKL.

Li et al. (2014) have adopted Harvard IV-4 dictionary and Loughran-McDonald financial dictionary to construct the sentiment dimensions. News articles are then quantitatively projected onto sentiment dimensions. Prediction models with sentiment analysis outperformed the BoW models at index, sector, and stock level. Performance difference between Harvard IV-4 dictionary and Loughran-McDonald financial dictionary is not noticeable. Techniques should be employed to automatically expand sentiment dictionaries. In Li, Wu & Wang (2020) , technical indicators are combined with news sentiment scores. Sentiment analysis is performed using general purpose and domain specific sentiment lexicon. Prediction model showed better result with domain specific lexicon than other lexicons. Multiple news sentiments within the same day are averaged to input in prediction model. But there should be a way to identify strength of each news so that their weighted average can be taken.

Allen, McAleer & Singh (2017) used DJIA news sentiment scores provided by Thomson Reuters. They analyzed the relationship between DJIA component companies stock prices and financial news sentiment scores using entropy based measures. In Deveikyte et al. (2020) , news headlines from FTSE100 is obtained from RavenPack. Furthermore, tweets and news stories for FTSE100 are also collected from other sources. They have assessed the relationship between stock market movement and sentiment score. They have also conducted a correlation analysis using Granger’s causality test. Both articles revealed the significance of sentiment score in estimating stock market behavior. However, dataset size was not enough for reliable results.

Machine learning techniques for stock prediction using numerical and textual data

In literature, machine learning based stock prediction techniques are divided into shallow learning and deep learning techniques. Research papers for stock prediction are discussed in earlier section in the context of text mining approaches. In this section, these research papers are outlined under machine learning categories.

Shallow learning techniques for stock prediction

Shallow Learning techniques provide predictive models with very few numbers of composition layers. It requires features that have already been processed well. It can work well with small dataset ( Pasupa & Sunhem, 2016 ). SVM and Artificial Neural Network (ANN) with one or two incorporated hidden layers are examples of shallow learning techniques. In Schumaker & Chen (2009) and Dadgar, Araghi & Farahani (2016) , news classification approaches are proposed using different shallow features based textual representation and SVM machine learning models. Groth & Muntermann (2011) used NB, k-NN, SVM and NN for text classification. Zhai, Hsu & Halgamuge (2007) , Hagenau, Liebmann & Neumann, 2013 and Luss & D’Aspremont (2015) used SVM model for stock prediction. Garcia-Lopez, Batyrshin & Gelbukh (2018) used Naive Bayes, Support Vector Machines, K-nearest neighbors, Logistic Regression, and multilayer perceptron. BoW and word embedding are used for text representation while classifiers performed better with word embedding than BoW.

Deep learning techniques for stock prediction

Deep learning techniques as compared to ANN, have several hidden layers. Deep learning algorithms require a big dataset in order to tackle over-fitting as well as a high performance computing unit like GPUs ( Pasupa & Sunhem, 2016 ). The promising feature of deep learning techniques is that it can extract features from data through learning. CNN and Recurrent Neural Network (RNN) are types of deep learning techniques. Vargas, De Lima & Evsukoff (2017) proposed a Recurrent Convolutional Neural Network model (RCNN) to take advantage of both models. CNN better extracts semantic information from text and RNN is better in catching context information. The results showed that the proposed model RCNN performance is better than CNN. In Ding et al. (2015) , evaluation results showed that CNN is superior to standard feedforward neural networks and SVM. Deng et al. (2019) proposed Knowledge Driven Temporal Convolutional Network (KDTCN) model based on 1-D Fully-Convolutional Network (FCN) architecture. KDTCN outperformed other advanced models for stock prediction. In Jin, Yang & Liu (2019) , a hybrid model is proposed that used stock prices data and investors sentiments. The proposed model adopted EMD as a frequency decomposition technique in order to deal with non-stationary stock price time series data. While LSTM with attention mechanism verified the superiority of the hybrid model in terms of prediction accuracy.

Stock market prediction is automated using two approaches: technical analysis and fundamental analysis. Technical analysis relies on historic price data, volume of transactions and their derived attributes. While fundamental analysis not only considers technical data but also incorporates economic and political factors that have direct impact on stock market. These factors can be captured from textual data in the form of news, tweets, company performance reports, etc.

The discussion starts in the context of technical analysis by considering stock price data as a time series data that contains non-linearity and volatility or noise. Initially, technical analysis adopts statistical approaches for stock prediction. These statistical approaches are limited in perceiving non-linearity, noise and dynamics between stocks. Instead machine learning and AI based techniques especially deep learning has brought substantial advancement to deal with stock price data. However, despite this feature of machine learning techniques, statistical methods are not entirely discarded from the set of opportunities. Likewise, EMD and its variants are used in literature as a preprocessing of non-linear and noisy data ( Yujun, Yimei & Jianhua, 2020 ).

Many de-noising techniques have been proposed to filter time series data like Independent Component Analysis (ICA) and Wavelet Transform (WT) ( Wang et al., 2011 ; Dai, Wu & Lu, 2012 ; Liang et al., 2019 ). Furthermore, segmentation is a pre-processing step that represents time series with less data and identifies technical patterns which facilitates data analysis ( Cavalcante et al., 2016 ). This discussion about analyzing stock price data deduces that stock price data should be analyzed by combining the strength of statistical and deep learning techniques. Forecasting using hybrid approaches improves accuracy significantly.

On the other hand, fundamental analysis deals especially with textual data. There are two types of approaches that mine information from textual data. First category is related to fact finding techniques and the other one is opinion mining techniques. In literature review, two prominent approaches are discussed for extracting facts from text that are shallow and event based features. Most earlier research work adopted text processing using shallow features like bag-of-words, noun phrases, and named entities etc. These techniques are basically different representations of text vocabulary with high dimensionality and without semantics ( Schumaker & Chen, 2009 ; Groth & Muntermann, 2011 ). Then some techniques are employed for shallow feature’s dimensionality reduction for instance, knowledge bases and generalized concepts ( Zhai, Hsu & Halgamuge, 2007 ; Luss & D’Aspremont, 2015 ). Word embedding which is another text representation technique reduces the dimensionality by allowing words with similar meaning to have similar representation ( Vargas, De Lima & Evsukoff, 2017 ; Garcia-Lopez, Batyrshin & Gelbukh, 2018 ). Later on, an entity relationship based technique is adopted for information extraction from text. So textual data is represented as structured events which contain entity-relation information ( Ding et al., 2014 ). The use of domain knowledge along with event extraction method refines this process and provides opportunity to use less training data with interpretable and traceable results ( Hogenboom et al., 2016 ; Chen et al., 2019 ). Moreover, event embedding gives the distributed representation of structured events which significantly reduces the issue of high dimensionality. While the use of knowledge base along with event embedding further refines textual feature extraction process ( Ding et al., 2016 ; Deng et al., 2019 ).

Alternatively, sentiment analysis is another way to mine information about opinion in text. Using machine learning algorithms, sentiment analysis is performed as a document classification problem ( De Fortuny et al., 2014 ). Another approach adopts sentiment lexicon to identify sentiment score. Sentiment lexicon provides an established foundation and it is mostly adopted for sentiment analysis. While sentiment lexicons for financial domain can enhance stock prediction accuracy like LM ( Loughran & McDonald, 2011 ). But its limited size doesn’t impact well on prediction accuracy. So, there are approaches that automate the process of increasing the size of sentiment lexicon by taking into account existing manually created data as a seed word set ( Jovanoski, Pachovski & Nakov, 2016 ; Kreutz & Daelemans, 2018 ). In financial domain, lexicon based sentiment analysis is performed to extract features from textual data ( Li, Wu & Wang, 2020 ).

Features extracted from numerical and textual data are aligned and input into prediction algorithm. Deep learning has more potential than shallow learning techniques to capture the complex hidden relationship between textual and numerical data. The promising feature of deep learning techniques is that they can extract features from data through learning. The work of Ding et al. (2015) , Ding et al. (2016) , Vargas, De Lima & Evsukoff (2017) , Chen et al. (2019) , Deng et al. (2019) and Li, Wu & Wang (2020) shows the strength of deep learning for prediction along with event base textual representation and sentiment analysis based features ( Picasso et al., 2019 ; Li, Wu & Wang, 2020 ). Solutions of the challenges in implementing news sensitive stock prediction are discussed in this section and are summarized in Table 2 .

Finally, Table 3 outlines state of the art techniques in the context of news sensitive stock prediction model. By observing this table, it can be deduced that hybrid approaches for feature extraction and prediction perform better by combining strengths of different approaches.

Open issues and research directions

Stock prediction has been an attractive research area for many years. The large amount of textual and numerical data is available to extract significant information and utilize it in prediction techniques. A lot of efficient techniques have been proposed to support data processing and decision making in this area. However, there are a few open problems which are discussed below:

News preprocessing

Text preprocessing is a non-trivial step before feature selection. There are different techniques that exist for text preprocessing. For instance, removing stop words, lowercasing, stemming, and lemmatizing etc. In ( Symeonidis, Effrosynidis & Arampatzis, 2018 ), comparative experiments are performed on twitter dataset with 16 different preprocessing techniques. This work investigates the significance of these techniques when they are used simultaneously or with different combinations. Twitter dataset contains a significant amount of noise as compared to the text written in a more formal way like news dataset. Preprocessing is also a considerable step for text with less noise in order to reduce feature dimensions and ignore meaningless data. So a thorough comparative evaluation of preprocessing techniques for news dataset will be valuable for the research community, especially dealing with news for stock trend prediction.

News categorization

In news sensitive stock prediction justified grouping of information plays an important role for efficient information retrieval. For the stock market, news should be properly categorized so that their impact on the stock market can be captured effectively. For instance, news related to politics, terrorism, foreign affairs, finance, economic, and government policies etc., should be categorized accordingly. Furthermore, news related to the stock market should be further categorized into general stock market news, sector related news, and specific stock related news. Supervised techniques are used for news categorization where enough training data exists for acceptable categorization accuracy. But if there is no training data then news is categorized using manual effort. Although unsupervised techniques exist with minimal manual efforts ( Usmani & Shamsi, 2020 ), there is still a need for further efforts required to fully automate these techniques.

Intensity of news impact

News influences the stock market with different intensities. So there should be a way to estimate the intensity of news impact on stock market. Moreover, it should also be investigated that for how long news has its impact on stock market and how the intensity of news impact gradually decreases over time.

Impact of bad and good news

On a stock trading day, there can be multiple news with different polarizations that influence market trading simultaneously. For instance, there might be some news that has a positive impact and at the same time some news has a negative impact on the stock market. So estimating their collective effect is also an open research problem in this domain.

News weighted impact

News from different categories have different influence on the stock market. For instance, regional news has more potential to impact the stock market than global news. In order to gain more insights, news categories should be incorporated and trained separately in the prediction model along with their learned weights.

Other sources for textual data

Recently a lot of work is being done to capture impact on stock market from news and twitter data. However, integration of additional resources of data like a quarterly or annual reports from a company can be used to further enhance the prediction accuracy. The companies registered in the stock market are required to report at frequent intervals. These reports publish a company activity and financial performance which can help to comprehend future stock trend.

Hybrid deep learning models

The application of stock prediction by employing Deep Neural Network (DNN) is gaining attention dramatically for many years. CNN and LSTM are examples of deep learning models used for stock prediction ( Liang et al., 2019 ; Yun, Sim & Seok, 2019 ). However, in Thakkar & Chaudhari (2021) worth of hybrid DNN approaches are discussed which improved prediction accuracy significantly. Vargas, De Lima & Evsukoff (2017) and Liu et al. (2020) proposed hybrid deep learning models by combining the worth of LSTM and CNN. Consequently, there is a need to investigate more sophisticated deep learning approaches by combining basic deep learning models.

This paper presents an extensive study of stock trend prediction using news and stock prices. It presents a generic approach to implement news sensitive stock prediction model and identifies three main phases. In each phase, challenges are identified and in search of opportunities existing literature is reviewed.

This work has four major contributions. The first contribution of this paper is to provide literature review on this topic. This work elaborates existing research paper and assesses their strengths and limitations. The discussion about existing literature is classified according to different phases in stock prediction namely: (i) forecasting using time series data, (ii) forecasting using financial time series and textual data, (iii) preprocessing and feature extraction in textual and numerical data, (iii) techniques for stock trend prediction using numerical and textual features.

The second contribution is a discussion about key concepts in this scenario that will significantly improve the reader’s comprehension. The third contribution is the identification of challenges and their state of the art solutions in the context of news sensitive stock prediction.

Furthermore, discussion about open issues and future research direction is another important contribution of this survey which is noteworthy for the research community.

Supplemental Information

Supplemental information 1, supplemental information 2, supplemental information 3, supplemental information 4, supplemental information 5, supplemental information 6, supplemental information 7, funding statement.

This research work is supported by the Higher Education Commission (HEC), Islamabad, Pakistan. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Additional Information and Declarations

Shazia Usmani and Jawwad A Shamsi declare that they have no competing interests.

Shazia Usmani performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Jawwad A Shamsi authored or reviewed drafts of the paper, and approved the final draft.

Artificial Intelligence Applied to Stock Market Trading: A Review

Ieee account.

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests
US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support
About IEEE Xplore
Accessibility
Terms of Use
Nondiscrimination Policy
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

KPMG Personalization

2024 Federal budget highlights

Canada's Deputy Prime Minister and Finance Minister Chrystia Freeland delivered Canada's 2024 federal budget on April 16, 2024

Share Share close
Download 2024 Federal budget highlights pdf Opens in a new window
1000 Save this article to my library
View Print friendly version of this article Opens in a new window
Go to bottom of page
Home ›
Insights ›

The budget expects a deficit of $40.0 billion for 2023-24 and forecasts deficits of $39.8 billion for 2024-25, and $38.9 billion for 2025-26. Although the budget does not change the federal personal or corporate tax rates, Finance announced that it will increase the inclusion rate for capital gains realized on or after June 25, 2024, in certain circumstances. Specifically, the inclusion rate will increase to 2/3 for corporations and trusts and to 2/3 for individuals on the portion of capital gains realized in the year that exceed $250,000. Finance also provided details on related changes to increase the Lifetime Capital Gains Exemption to $1.25 million, and clarified a temporary tax exemption on up to $10 million in capital gains that may be realized when a business is sold to an Employee Ownership Trust (EOT).

One of the major themes of this year’s budget is to increase the affordability of housing. In particular, the budget provides several incentives for purpose-built rental housing in Canada, including an elective exemption from the interest deductibility limitation and enhanced Capital Cost Allowance (CCA) for certain new additions of property. There are also new measures intended to benefit first-time homebuyers. Finance’s budget also focuses on clean economy changes, including to provide expected detail on the Clean Electricity investment tax credit, among others.

Download this edition of the TaxNewsFlash to learn more.

How new measures announced in this year’s budget may impact Canadians and businesses

KPMG in Canada provides the latest Canadian tax news and international tax news for you and your business

KPMG in Canada provides the latest Canadian and international tax news for your business

Quick tax information for corporations and individuals

IMAGES

(PDF) LITERATURE REVIEW OF STOCK MARKET
Basics Of Stock Market Arvind Arora Book [PDF]
(PDF) A Survey on Stock Market Prediction Using Machine Learning Techniques
(PDF) Stock Market Analysis: A Review and Taxonomy of Prediction Techniques
(PDF) Stock Market Volatility and Return Analysis: A Systematic
(PDF) A critical review of stock market development in India

VIDEO

Option trading Strategy.Stock market PDF.Risk hai to Ishk hai
THIS IS THE RIGHT TIME TO BUY THESE STOCKS
Basics of Stock Market| What to Learn as a Beginner |Detailed Tutorial 1
For Hindi share market pdf #stockmarket #optionstrading #intraday #trending #viral #banknifty #nse
Stock Market Trading Journal
stock market for beginners

COMMENTS

(PDF) Stock Markets: An Overview and A Literature Review
A stock exchange, also called a securities exchange or. bourse is the name given to the facility for engaging in buying and selling of shares of. stock or bonds or other financial instruments. For ...
Stock Market Liquidity: A Literature Review
Kumar and Misra (2015) evaluated 95 articles and presented a review of literature on various aspects of stock market liquidity like measurement of liquidity, determinants of liquidity, intraday movements, and liquidity effects on firm value. Recently, Díaz and Escribano (2020) reviewed 177 articles and discussed the dimensional liquidity ...
Stock market movement forecast: A Systematic review
A comprehensive systematic review on the stock market forecast is presented. Specific inputs and outputs in the stock market forecast are explored in depth. Comparison among different types of models for stock market forecast is provided. The analysis of predictive inputs combines fundamental and technical analysis.
Review Machine learning techniques and data for stock market
In this literature review, we investigate machine learning techniques that are applied for stock market prediction. A focus area in this literature review is the stock markets investigated in the literature as well as the types of variables used as input in the machine learning techniques used for predicting these markets.
Stock Markets: An Overview and A Literature Review
Stock markets are without any doubt, an integral and indispensable part of a country's economy. But the impact of stock markets on the country's economy can be different from how the other countries' stock markets affect their economies. This is because the impact of stock markets on the economy depends on various factors like the organization of stock exchanges, its relationship with ...
A systematic review of fundamental and technical analysis of stock
The stock market is a key pivot in every growing and thriving economy, and every investment in the market is aimed at maximising profit and minimising associated risk. As a result, numerous studies have been conducted on the stock-market prediction using technical or fundamental analysis through various soft-computing techniques and algorithms. This study attempted to undertake a systematic ...
PDF A Literature Review of the Stock Market Integration
A Literature Review of the Stock Market Integration DOI: 10.1057/9781137381705 1.1 The role of stock market integration in global financial integration Financial market integration is a central theme in international finance, and it is considered as a key factor in delivering competitiveness, effi-ciency and growth.
Stock Market Volatility and Return Analysis: A Systematic Literature Review
However, the main purpose of this review is to examine effective GARCH models recommended for performing market returns and volatilities analysis. The secondary purpose of this review study is to conduct a content analysis of return and volatility literature reviews over a period of 12 years (2008-2019) and in 50 different papers.
PDF Stock Markets An Overview WP
Literature Review A., Rjumohan April 2019 Online at https://mpra.ub.uni-muenchen.de/101855/ MPRA Paper No. 101855, posted 15 Jul 2020 13:12 UTC. 12 ... The early 1990s saw a surge in stock market trading with a majority of . 18 the general public investing in stocks by borrowing money. This slowly resulted in a
LITERATURE REVIEW OF STOCK MARKET
See Full PDFDownload PDF. THE IMPACT OF STOCK MARKET PERFORMANCE ON THE GROWTH OF NIGERIAN ECONOMY. Donexbefree Abdul. This study is motivated primarily by the need to enhance capital accumulation from the stock market, being the long term end of the financial system. This study is an investigation of the impact of Nigeria stock exchange ...
Text-based Stock Market Analysis: A Review
In this. study, we provide a review on the immense amount of existing literature of text-based stock market analysis. We present input data. types and cover main textual data sources and variations. Feature representation techniques are then presented. Then, we cover the.
Stock Market Volatility and Return Analysis: A Systematic Literature Review
This research provides a literature review using a systematic database to examine and cross-reference snowballing and examines effective GARCH models recommended for performing market returns and volatilities analysis. In the field of business research method, a literature review is more relevant than ever. Even though there has been lack of integrity and inflexibility in traditional ...
Stock Market Liquidity: A Literature Review
The earlier literature review papers on stock market liquidity have considered an inconsistent number of studies from various research databases while some have examined a few studies that highlight different perspectives of stock market liquidity. ... PDF/ePub View PDF/ePub. Get access. Access options. If you have access to journal content via ...
PDF Stock Market and Investment: Is the Market a Sideshow?
the stock market might predict investment, and how investor sentiment might itself influence investment through the stock market. In the third section, we describe the tests that we use to discover how the stock market influences investment. The fourth and fifth sections present evidence using firm-level data from the COMPUSTAT data base bearing
Stock Markets: An Overview and A Literature Review
A., Rjumohan, 2019. " Stock Markets: An Overview and A Literature Review ," MPRA Paper 101855, University Library of Munich, Germany. Downloadable! Stock markets are without any doubt, an integral and indispensable part of a country's economy. But the impact of stock markets on the country's economy can be different from how the other ...
Stock Market Volatility and Return Analysis: A Systematic Literature Review
In the field of business research method, a literature review is more relevant than ever. Even though there has been lack of integrity and inflexibility in traditional literature reviews with questions being raised about the quality and trustworthiness of these types of reviews. This research provides a literature review using a systematic database to examine and cross-reference snowballing ...
Determinants of stock market development: a review of the literature
Based on the theoretical literature, the determinants of stock market development can be broadly classified into two groups: macroeconomic factors and institutional factors. The theory and the empirics predict different ways in which macroeconomic factors affect stock market development. The real income and its growth rate foster stock market ...
News sensitive stock market prediction: literature review and
This paper presents an extensive study of stock trend prediction using news and stock prices. It presents a generic approach to implement news sensitive stock prediction model and identifies three main phases. In each phase, challenges are identified and in search of opportunities existing literature is reviewed.
Artificial Intelligence Applied to Stock Market Trading: A Review
This paper presents a systematic review of the literature on Artificial Intelligence applied to investments in the stock market based on a sample of 2326 papers from the Scopus website between 1995 and 2019. These papers were divided into four categories: portfolio optimization, stock market prediction using AI, financial sentiment analysis ...
Full article: Organizational culture: a systematic review
2.1. Definition of organizational culture. OC is a set of norms, values, beliefs, and attitudes that guide the actions of all organization members and have a significant impact on employee behavior (Schein, Citation 1992).Supporting Schein's definition, Denison et al. (Citation 2012) define OC as the underlying values, protocols, beliefs, and assumptions that organizational members hold, and ...
2024 Federal budget highlights
The budget expects a deficit of $40.0 billion for 2023-24 and forecasts deficits of $39.8 billion for 2024-25, and $38.9 billion for 2025-26. Although the budget does not change the federal personal or corporate tax rates, Finance announced that it will increase the inclusion rate for capital gains realized on or after June 25, 2024, in certain circumstances.