• Search Menu
  • Computer Science
  • Earth Sciences
  • Information Science
  • Life Sciences
  • Materials Science
  • Science Policy
  • Advance Access
  • Special Topics
  • Author Guidelines
  • Submission Site
  • Open Access Options
  • Self-Archiving Policy
  • About National Science Review
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Article Contents

Why machine learning is important, about deep learning, research challenges for machine learning, about the threat of advanced ai, miscellaneous.

  • < Previous

Machine learning challenges and impact: an interview with Thomas Dietterich

Zhi-Hua Zhou is a professor at Nanjing University in China

  • Article contents
  • Figures & tables
  • Supplementary Data

Zhi-Hua Zhou, Machine learning challenges and impact: an interview with Thomas Dietterich, National Science Review , Volume 5, Issue 1, January 2018, Pages 54–58, https://doi.org/10.1093/nsr/nwx045

  • Permissions Icon Permissions

Machine learning is the driving force of the hot artificial intelligence (AI) wave. In an interview with NSR, Prof. Thomas Dietterich, the distinguished professor emeritus of computer science at Oregon State University in the USA, the former president of Association of Advancement of Artificial Intelligence (AAAI, the most prestigious association in the field of artificial intelligence) and the founding president of the International Machine Learning Society, talked about exciting recent advances and technical challenges of machine learning, as well as its big impact on the world.

NSR: Why machine learning is useful?

Dietterich: Machine learning provides a new method for creating high-performance software. In traditional software engineering, we talk with the users, formulate the requirements and then design, implement and test algorithms for achieving those requirements. With machine learning, we still formulate the overall goal of the software system, but instead of designing our own algorithms, we collect training examples (usually, by having people label data points) and then apply a machine learning algorithm to automatically learn the desired function.

This new methodology allows us to create software for many problems that we were not able to solve using previous software engineering methods. In particular, the performance of previous methods for visual object detection and recognition, speech recognition and language translation were not good enough to be usable. But with recent advances in machine learning, we now have systems that can perform these tasks with accuracy that matches human performance (more or less).

Machine learning is therefore providing a key technology to enable applications such as self-driving cars, real-time driving instructions, cross-language user interfaces and speech-enabled user interfaces. Machine learning is also valuable for web search engines, recommendation systems and personalized advertising. Many people predict that machine learning methods will lead to a revolution in medicine, particularly in the automatic collection and analysis of medical images. Machine learning is also a promising tool for many operational aspects of modern companies. For example, machine learning can help predict customer demand and optimize supply chains. It is also a key technology for training robots to perform flexible manufacturing tasks.

NSR: Why is machine learning important to the science community and to society?

Thomas Dietterich, professor at Oregon State University (Courtesy of Thomas Dietterich).

Thomas Dietterich, professor at Oregon State University (Courtesy of Thomas Dietterich).

Dietterich: Machine learning methods can be helpful in data collection and analysis. For example, machine learning methods are applied to analyse the immense amount of data collected by the Large Hadron Collider, and machine learning techniques are

With recent advances in machine learning, we now have systems that can [recognize objects in images, recognize speech, and translate languages] with accuracy that matches human performance. — Thomas Dietterich

critical to analysing astronomical data. Machine learning techniques can help scientists decide which data points to collect by helping design experiments. And robotic systems can then automatically perform those experiments either in the lab or in the real world. For example, there is an Automated Scientist developed by Ross King that designs, executes and analyses its own experiments. Ocean-going glider robots are controlled by AI systems. And machine learning techniques are starting to be applied to control drones that collect data in ecosystems and in cities.

My own research focuses on applying machine learning to improve our management of the earth's ecosystems. For example, in Oregon, we have frequent forest fires caused by lightning. These fires can destroy habitat for endangered species and burn up trees that could have been used to build houses. One cause of these large fires is that for many years, the USA suppressed every fire. This is very expensive, and it allows fuel to accumulate in the forests so that when a new fire is started, it burns very hot and is much more damaging. We are applying machine learning (reinforcement learning) methods to find good rules for deciding which fires should be suppressed and which fires should be permitted to burn. These rules save money and help preserve endangered species.

Machine learning methods can also be applied to map the location and population of endangered species such as the panda. In the USA, we have developed new machine learning methods for predicting and understanding bird migration. Similar problems arise in mapping the spread of new diseases, of air pollution and of traffic.

In business and finance, machine learning methods can help identify fraud and theft. My group has been studying algorithms for anomaly detection that can identify unusual transactions and present them to a human analyst for law enforcement.

Machine learning methods can also contribute to the development of ‘Smart Cities’. I mentioned traffic management and pollution mapping. But machine learning techniques can also be applied to identify where new infrastructure is needed (e.g. water supply, electricity, internet). In the USA, machine learning has been applied to map the distribution of lead paint, which was used in older buildings in the 20 th century and is a neurotoxin.

NSR: Could you comment on the strength and weakness of deep learning?

Dietterich: The most exciting recent development is the wave of research on deep learning methods. Most machine learning methods require the data scientist to define a set of ‘features’ to describe each input. For example, in order to recognize an object in an image, the data scientist would first need to extract features such as edges, blobs and textured regions from the image. Then these could be fed to a machine learning algorithm to recognize objects. Deep learning allows us to feed the raw image (the pixels) to the learning algorithm without first defining and extracting features. We have discovered that deep learning can learn the right features, and that it does this much better than we were able to hand-code those features. So in problems where there is a big gap between the inputs (e.g. images, speech signals, etc.) and the outputs (e.g. objects, sentences, etc.), deep learning is able to do much better than previous machine learning methods.

However, there are still many problems where the features are easy to obtain. For example, in fraud detection, we might examine the number of credit card transactions, where and when they were initiated, and so on. These are already represented in high level features, and in such applications deep learning does not provide much, if any, benefit. Deep learning algorithms are also difficult to train and require large amounts of computer time, so in most problems, they are not the preferred method.

Deep learning is one particular method for machine learning. It is quite difficult to use, so in problems where features are available, it is generally much better to use methods such as random forests or boosted trees. These methods are very easy to use and require very little experience. They are also many orders of magnitude faster than deep learning methods, so they can run on a laptop or a smart phone instead of requiring a GPU supercomputer. An important goal of machine learning work is to make machine learning techniques usable by people with little or no formal training in machine learning. I am Chief Scientist of a company called BigML that has developed cloud-based machine learning services that are extremely easy to use. They are all based on decision tree methods (including boosting and random forests).

There are also interesting ways to combine deep learning with standard AI techniques. The best example is Alpha Go, which combines deep learning (to analyse the patterns of stones on the Go board) and Monte Carlo tree search (to search ahead into the future of the game to determine the consequences of alternate moves). Similarly, self-driving cars combine top-level software (for safety, control, and user interface) with deep learning methods for computer vision and activity recognition.

NSR: Could you comment on the research challenges for machine learning?

Dietterich: There are many important research challenges for machine learning. The first challenge is to improve methods for unsupervised and reinforcement learning. Virtually all of the recent advances have been in so-called ‘supervised learning’,

The most exciting recent development is the wave of research on deep learning methods. —Thomas Dietterich

where a ‘teacher’ tells the computer the right answer for each training example. However, there are many problems where we lack teachers but where we have huge amounts of data. One example is when we seek to detect anomalies or fraudulent transactions. There is work in developing ‘anomaly detection’ algorithms that can learn from such data without the need of a teacher. More generally, there are many classes of ‘unsupervised’ learning algorithms that can learn without a teacher.

Another area where more research is needed is in reinforcement learning. Reinforcement learning involves teaching a computer to perform a complex task by giving it rewards or punishments. In many problems, the computer can compute the reward itself, and this allows the computer to learn by trial and error rather than from a teacher's examples. Reinforcement learning is particularly valuable in control problems (such as self-driving cars, robots and the fire management problem I mentioned before). Methods for reinforcement learning are still very slow and difficult to apply, so researchers are attempting to find ways of speeding them up. Existing reinforcement learning algorithms also operate at a single time scale, and this makes it difficult for these methods to learn in problems that involve very different time scales. For example, the reinforcement learning algorithm that learns to drive a car by keeping it within the traffic lane cannot also learn to plan routes from one location to another, because these decisions occur at very different time scales. Research in hierarchical reinforcement learning is attempting to address this problem.

The second major research problem for machine learning is the problem of verification, validation and trust. Traditional software systems often contain bugs, but because software engineers can read the program code, they can design good tests to check that the software is working correctly. But the result of machine learning is a ‘black box’ system that accepts inputs and produces outputs but is difficult to inspect. Hence, a very active topic in machine learning research is to develop methods for making machine learning systems more interpretable (e.g. by providing explanations or translating their results into easy-to-understand forms). There is also research on automated methods for verification and validation of black box systems. One of the most interesting new directions is to create automated ‘adversaries’ that attempt to break the machine learning system. These can often discover inputs that cause the learned program to fail.

A related area of research is ‘robust machine learning’. We seek machine learning algorithms that work well even when their assumptions are violated. The biggest assumption in machine learning is that the training data are assumed to be independently distributed and to be a representative example of the future input to the system. Several researchers are exploring ways of making machine learning systems more robust to failures of this assumption.

The third major challenge for machine learning is the question of bias. There are often biases in the way that data are collected. For example, experiments on the effectiveness of new drugs may be performed only on men. A machine learning system might then learn that the drugs are only effective for people older in 35 years. But in women, the effectiveness might be completely different. In a company, data might be collected from current customers, but these data might not be useful for predicting how new customers will behave, because the new customers might be different in some important way (younger, more internet-savvy, etc.). Current research is developing methods for detecting such biases and for creating learning algorithms that can recover from these biases.

NSR: With the rapid progress of machine learning, will human jobs be threatened by machines? Could you comment on the ‘singularity theory’ and the arguments about the risks of advanced AI?

Dietterich: Like all new technologies, machine learning is definitely going to change the job market. Jobs that involve simple repetitive activity—whether it is repeated physical actions (like factory work and truck driving) or repeated intellectual actions (like much work in law, accounting, and medicine)—will likely be at least partially replaced by software and robots. As with the Industrial Revolution, there is likely to be a large disruption in the economy as these new technologies are developed. The important question is whether machine learning and AI will also create new kinds of jobs. This also occurred during the Industrial Revolution, and I think it will happen again in the AI revolution. It is hard to predict what these jobs will be.

I think about what happened when the internet was developed. I was a graduate student in the early 1980s when the Internet Protocols were developed and deployed. They were designed to make it easy to move files from one computer to another and to log in to remote computers from local computers. We had no idea about the world wide web, search engines, electronic commerce or social networks! This means we also did not predict the new jobs that resulted (web page designers, user experience engineers, digital advertising, recommendation system designers, cyber security engineers and so on).

I think it is similarly very difficult today to predict what the jobs of the future will be. There will certainly be jobs involved in creating AI systems, teaching them, customizing them and repairing them. I suspect that it will not be cost-effective to completely automate most existing jobs. Instead, maybe 80% of each job will be automated, but a human will need to do the remaining 20%. That human thereby becomes much more valuable and will be paid well.

One aspect of many human jobs that I believe will be very difficult to automate is empathy. Robots and AI systems will have very different experiences than people. Unlike people, they will not be able to ‘put themselves into a person's shoes’ in order to understand and empathize with humans. Instead, they will need to be taught, like aliens or like Commander Data in Star Trek, to predict and understand human emotions. In contrast, people are naturally able to do these things, because we all know ‘what it feels like’ to be human. So jobs that involve empathy (e.g. counseling, coaching, management, customer service) are least likely to be satisfactorily automated. This will be particularly true if human customers place a value on ‘authentic human interaction’ rather than accepting an interaction with a robot or automated system.

If most industrial and agricultural production becomes automated—and if the resulting wealth is evenly distributed throughout society—then humans are likely to find other things to do with their time. Traveling to visit other countries and other cultures will likely become even more popular than it is today. Sports, games, music and the arts may also become much more popular. One hundred years ago, it was hard to get a massage or a pedicure. Now these are available almost everywhere. Who knows what people will want to do and what experiences they will want to have 100 years from now?

There are two different popular notions of the ‘singularity’. Let me talk about each of them. One notion, exemplified by the writings of Ray Kurzweil, is that because of the exponential improvement of many technologies, it is difficult for us to see very far into the future. This is the idea of a ‘technological singularity’. A true mathematical singularity would be the point at which technology improves infinitely quickly. But this is impossible, because there are limits to all technologies (although we don’t know what they are). There is a famous law in economics due to Herbert Stein: ‘If something can’t go on forever, it won’t.’ This is true for Moore's Law, and it is true for all AI technologies. However, even if a true mathematical singularity is impossible, we are currently experiencing exponential growth in the capabilities of AI systems, so their future capabilities will be very different from their current capabilities, and standard extrapolation is impossible. So I believe Kurzweil is correct that we cannot see very far into this exponentially-changing future.

There is a second notion of ‘singularity’ that refers to the rise of so-called superintelligence. The argument—first put forth by I.J. Good in an article in 1965—is that at some point AI technology will cross a threshold where it will be able to improve itself recursively and then it will very rapidly improve and become exponentially smarter than people. It will be the ‘last invention’ of humanity. Often, the threshold is assumed to be ‘human-level AI’, where the AI system matches human intelligence. I am not convinced by this argument for several reasons. First, the whole goal of machine learning is to create computer systems that can learn autonomously. This is a form of self-improvement (and often it is applied to improve the learning system itself and hence is recursive self-improvement). However, such systems have never been able to improve themselves beyond one iteration. That is, they improve themselves, but then the resulting system is not able to improve itself. I believe the reason for this is that we formulate the problem as a problem of function optimization, and once you have found the optimal value of that function, further optimization cannot improve it, by definition. To maintain exponential improvements, every technology requires repeated breakthroughs. Moore's Law for example is not a single process, but actually a staircase of improvements where each ‘step’ involved a different breakthrough. I believe this leads us back to the Kurzweil-type technological singularity rather than to superintelligence.

Second, it is very suspicious that the arguments about superintelligence set the threshold to match human intelligence. This strikes me as the same error that was exposed by Copernicus and by Darwin. Humans are presumably not special in any way with respect to the intelligence that computers can attain. The limits we encounter are probably dictated by many factors including the size and computing power of our brains, the durations of our lives, and the fact that each one of us must learn on our own (rather than like parallel and distributed computers). Computers are already more intelligent than people on a wide range of tasks including job shop scheduling, route planning, control of aircraft, simulation of complex systems (e.g. the atmosphere), web search, memory, arithmetic, certain forms of theorem proving, and so on. But none of these super-human capabilities has led the kind of superintelligence described by Good.

Third, we observe in humans that intelligence tends to involve breadth rather than depth. A great physicist such as Stephen Hawking is much smarter than me about cosmology, but I am more knowledgeable than he is about machine learning. Furthermore, experiments have revealed that people who are experts in one aspect of human endeavor are no better than average in most other aspects. This suggests that the metaphor of intelligence as rungs on a ladder, which is the basis of the argument on recursive self-improvement, is the wrong metaphor. Instead, we should consider the metaphor of a liquid spreading across the surface of a table or the metaphor of biodiversity where each branch of knowledge fills a niche in the rain forest of human intelligence. This metaphor does not suggest that there is some threshold that, once exceeded, will lead to superintelligence.

The claim that Kurzweil's view of the singularity is the right one does not mean that AI technology is inherently safe and that we have nothing to worry about. Far from it. Indeed, as computers become more intelligent, we are motivated to put them in charge of high-risk decision making such as controlling self-driving cars, managing the power grid, or fighting wars (as autonomous weapon systems). As I mentioned above, the machine learning technology of today is not sufficiently reliable or robust to be entrusted with such dangerous decisions. I am very concerned that premature deployment of AI technologies could lead to a major loss of life because of some bug in the machine learning components. Much like the HAL 9000 in ‘2001, A Space Odyssey’, computers could ‘take over the world’ because we gave them autonomous control of important systems and then there was a programming error or a machine learning failure. I do not believe that computers will spontaneously ‘decide’ to take over the world; that is just a science fiction story line. I also don’t believe that computers will ‘want to be like us’; that is another story line that goes back at least to the Pinocchio story (and perhaps there is an even older story in Chinese culture?).

I think people should be “in the loop” in all of these high-risk decision making applications. — Thomas Dietterich

People may not be as accurate or as fast as computers in making decisions, but we are more robust to unanticipated aspects of the world and hence better able to recognize and respond to failures in the computer system. For this reason, I think people should be ‘in the loop’ in all of these high-risk decision making applications.

NSR: Many machine learning professors in the US have moved to big companies. Could you comment on this?

Dietterich: Yes, there has been a substantial ‘brain drain’ as professors move to companies. Let me discuss the causes and the effects of this. There are several causes. First, because many companies are engaged in a race to develop new AI products, they are offering very large salaries to professors. Second, because many machine learning techniques (especially deep learning) require large amounts of data and because companies are able to collect large amounts of data, it is much easier to do research on ‘big data’ and deep learning at companies than at universities. Third, companies can also afford to purchase or develop special computers for deep learning such as GPU computers or Google's Tensor Processing Units (TPUs). This is another thing that is very difficult to do in universities.

What are the effects of this? The main effect is that universities can’t train as many students in AI and machine learning as they could in the past, because they lack the professors to teach and guide research. Universities also lack access to big data sets and to special computers. Industry and government should address these problems by providing funds for collecting data sets and for purchasing specialized computers. I’m not sure how governments can address the brain drain problem, but they can address the data and computing problems.

With the exception of work on big data and deep learning, all other forms of machine learning (and all of the challenges that I listed above) are easy to study in university laboratories. In our lab at Oregon State University, for example, we are studying anomaly detection, reinforcement learning and robust machine learning.

NSR: Could you comment on the contributions and impact to the field that are coming from China? Do you feel any obstacle encumbering Chinese researchers from making higher impact?

Dietterich: Chinese scientists (working both inside and outside China) are making huge contributions to the development of machine learning and AI technologies. China is a leader in deep learning for speech recognition and natural language translation, and I am expecting many more contributions from Chinese researchers as a result of the major investments of government and industry in AI research in China. I think the biggest obstacle to having higher impact is communication. Most computer science research is published in English, and because English is difficult for Mandarin speakers to learn, this makes it difficult for Chinese scientists to write papers and give presentations that have a big impact. The same problem occurs in the reverse direction. China is now the home of a major faction of AI research (I would guess at least 25%). People in the West who do not read Chinese are slow to learn about advances in China. I hope that the ongoing improvements in language translation will help lower the language barrier. A related communication problem is that the internet connection between China and the rest of the world is often difficult to use. This makes it hard to have teleconferences or Skype meetings, and that often means that researchers in China are not included in international research projects.

NSR: What suggestions will you give young researchers entering this field?

Dietterich: My first suggestion is that students learn as much mathematics as possible. Mathematics is central to machine learning, and math is difficult to learn on your own. So I recommend all students in university to study mathematics. My second suggestion is to read the literature as much as possible. Don’t just read the deep learning papers, but study the theory of machine learning, AI and algorithms. New learning algorithms arise from deep understanding of the mathematics and the structure of optimization problems. Don’t forget that ideas in other branches of knowledge (e.g. statistics, operations research, information theory, economics, game theory, philosophy of science, neuroscience, psychology) have been very important to the development of machine learning and AI. It is valuable to gain experience working in teams. Most research today is collaborative, so you should get practice working in teams and learning how to resolve conflicts. Finally, it is important to cultivate your skills in programming and in communication. Learn to program well and to master the latest software engineering tools. Learn to write and to speak well. This goes far beyond just learning English grammar and vocabulary. You must learn how to tell a compelling story about your research that brings out the key ideas and places them in context.

Email alerts

Citing articles via.

  • Recommend to Your Librarian

Affiliations

  • Online ISSN 2053-714X
  • Print ISSN 2095-5138
  • Copyright © 2024 China Science Publishing & Media Ltd. (Science Press)
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

OL_sub-brand_lockup_two-line_rgb_black-ol

  • Data Science
  • Engineering
  • Entrepreneurship
  • Technology Insider
  • Manufacturing
  • MIT Bootcamps
  • MIT Open Learning
  • MITx MicroMasters Programs
  • Online Education
  • Professional Development
  • Quantum Computing

  View All Posts

Here are the Most Common Problems Being Solved by Machine Learning

By: MIT xPRO on August 5th, 2020 3 Minute Read

Print/Save as PDF

Here are the Most Common Problems Being Solved by Machine Learning

Machine Learning

Although machine learning offers important new capabilities for solving today’s complex problems, more organizations may be tempted to apply machine learning techniques as a one-size-fits all solution. 

To use machine learning effectively, engineers and scientists need a clear understanding of the most common issues that machine learning can solve. In a recent MIT xPRO Machine Learning whitepaper titled  " Applications For Machine Learning In Engineering and the Physical Sciences,” Professor Youssef Marzouk and fellow MIT colleagues outlined the potentials and limitations of machine learning in STEM. 

Here are some common challenges that can be solved by machine learning:

Accelerate processing and increase efficiency Machine learning can wrap around existing science and engineering models to create fast and accurate surrogates, identify key patterns in model outputs, and help further tune and refine the models. All this helps more quickly and accurately predict outcomes at new inputs and design conditions.

Quantify and manage risk. Machine learning can be used to model the probability of different outcomes in a process that cannot easily be predicted due to randomness or noise. This is especially valuable for situations where reliability and safety are paramount.

Compensate for missing data. Gaps in a data set can severely limit accurate learning, inference, and prediction. Models trained by machine learning improve with more relevant data. When used correctly, machine learning can also help synthesize missing data that round out incomplete datasets.

Make more accurate predictions or conclusions from your data . You can streamline your data-to-prediction pipeline by tuning how your machine learning model’s parameters will be updated and learning during training. Building better models of your data will also improve the accuracy of subsequent predictions.

Solve complex classification and prediction problems. Predicting how an organism’s genome will be expressed or what the climate will be like in fifty years are examples of highly complex problems. Many modern machine learning problems take thousands or even millions of data samples (or far more) across many dimensions to build expressive and powerful predictors, often pushing far beyond traditional statistical methods.

Create new designs. There is often a disconnect between what designers envision and how products are made. It’s costly and time-consuming to simulate every variation of a long list of design variables. Machine learning can identify key variables, automatically generate good options, and help designers identify which best fits their requirements.

Increase yields. Manufacturers aim to overcome inconsistency in equipment performance and predict maintenance by applying machine learning to flag defects and quality issues before products ship to customers, improve efficiency on the production line, and increase yields by optimizing the use of manufacturing resources.

Machine learning is undoubtedly hitting its stride, as engineers and physical scientists leverage the competitive advantage of big data across industries — from aerospace, to construction, to pharmaceuticals, transportation, and energy. But it has never been more important to understand the physics-based models, computational science, and engineering paradigms upon which machine learning solutions are built.

The list above details the most common problems that organizations can solve with machine learning. For more specific applications across engineering and the physical sciences, download MIT xPRO’s free Machine Learning whitepaper .

MIT Logo

  • More about MIT xPRO
  • About this Site
  • Terms of Service
  • Privacy Policy

openedx-logo

Current Machine Learning Research Directions

Here are some current research questions / problems in Machine Learning that are required still need to do more work on these:

  • e.g., learning to classify webpages or spam
  • In Supervised Learning, it involves learning a single function f, but, our goal is to learn a family of related functions (e.g., a diagnosis function for patients in New York hospitals, and one for patients in Lahore hospitals).
  • Hierarchical Bayesian approaches provide one way to tackle this problem.
  • The goal is to develop a theoretical understanding of the relationships among ML algorithms, and of when it is appropriate to use each.
  • Drug Testing where one wishes to learn the drug effectiveness while minimizing the exposure of patients to possible unknown side effects.
  • Toxicity Forecasting : How Machine Learning Will Identify The Drugs Of The Future
  • “Protein Folding Problem” - how can you accurately predict the 3D structure of a protein from its amino acid sequence?
  • Cryptographic approaches.
  • A program to learn to read the web might learn a graded set of capabilities beginning with simpler abilities such as learning to recognize names of people and places , and extending to extracting complex relational information spread across multiple sentences and web pages.
  • A key research issue here is self-supervised learning and constructing an appropriate graded curriculum .
  • e.g., Reinforcement Learning: reward-based learning.
  • Machine vs Human learning : role of motivation, fear, urgency, forgetting, and learning over multiple time scales
  • The goal is to develop a general theory of learning processes covering animals as well as machines.
  • The goal is for designing programming language constructs for declaring what training experience should be given to each “to be learned” subroutine, when, and with what safeguards against arbitrary changes to program behavior.
  • e.g., incorporation of multiple sensory modalities (e.g., vision, sound, touch) to provide a setting in which self-supervised learning could be applied to predict one sensory experience from the others.
  • It is an important research problem to develop learning algorithms that’re trained on one distribution and generalize well to another.

Tom M. Mitchell, “The Discipline of Machine Learning”. ↩

Andrew Ng, “Machine Learning Yearning”. ↩

I hope, you will enjoy to read this post. Please, feel free to add comments about your queries, I would like to answers them.

Subscribe to our mailing list

machine learning Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

An explainable machine learning model for identifying geographical origins of sea cucumber Apostichopus japonicus based on multi-element profile

A comparison of machine learning- and regression-based models for predicting ductility ratio of rc beam-column joints, alexa, is this a historical record.

Digital transformation in government has brought an increase in the scale, variety, and complexity of records and greater levels of disorganised data. Current practices for selecting records for transfer to The National Archives (TNA) were developed to deal with paper records and are struggling to deal with this shift. This article examines the background to the problem and outlines a project that TNA undertook to research the feasibility of using commercially available artificial intelligence tools to aid selection. The project AI for Selection evaluated a range of commercial solutions varying from off-the-shelf products to cloud-hosted machine learning platforms, as well as a benchmarking tool developed in-house. Suitability of tools depended on several factors, including requirements and skills of transferring bodies as well as the tools’ usability and configurability. This article also explores questions around trust and explainability of decisions made when using AI for sensitive tasks such as selection.

Automated Text Classification of Maintenance Data of Higher Education Buildings Using Text Mining and Machine Learning Techniques

Data-driven analysis and machine learning for energy prediction in distributed photovoltaic generation plants: a case study in queensland, australia, modeling nutrient removal by membrane bioreactor at a sewage treatment plant using machine learning models, big five personality prediction based in indonesian tweets using machine learning methods.

<span lang="EN-US">The popularity of social media has drawn the attention of researchers who have conducted cross-disciplinary studies examining the relationship between personality traits and behavior on social media. Most current work focuses on personality prediction analysis of English texts, but Indonesian has received scant attention. Therefore, this research aims to predict user’s personalities based on Indonesian text from social media using machine learning techniques. This paper evaluates several machine learning techniques, including <a name="_Hlk87278444"></a>naive Bayes (NB), K-nearest neighbors (KNN), and support vector machine (SVM), based on semantic features including emotion, sentiment, and publicly available Twitter profile. We predict the personality based on the big five personality model, the most appropriate model for predicting user personality in social media. We examine the relationships between the semantic features and the Big Five personality dimensions. The experimental results indicate that the Big Five personality exhibit distinct emotional, sentimental, and social characteristics and that SVM outperformed NB and KNN for Indonesian. In addition, we observe several terms in Indonesian that specifically refer to each personality type, each of which has distinct emotional, sentimental, and social features.</span>

Compressive strength of concrete with recycled aggregate; a machine learning-based evaluation

Temperature prediction of flat steel box girders of long-span bridges utilizing in situ environmental parameters and machine learning, computer-assisted cohort identification in practice.

The standard approach to expert-in-the-loop machine learning is active learning, where, repeatedly, an expert is asked to annotate one or more records and the machine finds a classifier that respects all annotations made until that point. We propose an alternative approach, IQRef , in which the expert iteratively designs a classifier and the machine helps him or her to determine how well it is performing and, importantly, when to stop, by reporting statistics on a fixed, hold-out sample of annotated records. We justify our approach based on prior work giving a theoretical model of how to re-use hold-out data. We compare the two approaches in the context of identifying a cohort of EHRs and examine their strengths and weaknesses through a case study arising from an optometric research problem. We conclude that both approaches are complementary, and we recommend that they both be employed in conjunction to address the problem of cohort identification in health research.

Export Citation Format

Share document.

ScienceDaily

Science has an AI problem: This group says they can fix it

32 questions to tamp a smoldering crisis of confidence.

AI holds the potential to help doctors find early markers of disease and policymakers to avoid decisions that lead to war. But a growing body of evidence has revealed deep flaws in how machine learning is used in science, a problem that has swept through dozens of fields and implicated thousands of erroneous papers.

Now an interdisciplinary team of 19 researchers, led by Princeton University computer scientists Arvind Narayanan and Sayash Kapoor, has published guidelines for the responsible use of machine learning in science.

"When we graduate from traditional statistical methods to machine learning methods, there are a vastly greater number of ways to shoot oneself in the foot," said Narayanan, director of Princeton's Center for Information Technology Policy and a professor of computer science. "If we don't have an intervention to improve our scientific standards and reporting standards when it comes to machine learning-based science, we risk not just one discipline but many different scientific disciplines rediscovering these crises one after another."

The authors say their work is an effort to stamp out this smoldering crisis of credibility that threatens to engulf nearly every corner of the research enterprise. A paper detailing their guidelines appeared May 1 in the journal Science Advances .

Because machine learning has been adopted across virtually every scientific discipline, with no universal standards safeguarding the integrity of those methods, Narayanan said the current crisis, which he calls the reproducibility crisis, could become far more serious than the replication crisis that emerged in social psychology more than a decade ago.

The good news is that a simple set of best practices can help resolve this newer crisis before it gets out of hand, according to the authors, who come from computer science, mathematics, social science and health research.

"This is a systematic problem with systematic solutions," said Kapoor, a graduate student who works with Narayanan and who organized the effort to produce the new consensus-based checklist.

The checklist focuses on ensuring the integrity of research that uses machine learning. Science depends on the ability to independently reproduce results and validate claims. Otherwise, new work cannot be reliably built atop old work, and the entire enterprise collapses. While other researchers have developed checklists that apply to discipline-specific problems, notably in medicine, the new guidelines start with the underlying methods and apply them to any quantitative discipline.

One of the main takeaways is transparency. The checklist calls on researchers to provide detailed descriptions of each machine learning model, including the code, the data used to train and test the model, the hardware specifications used to produce the results, the experimental design, the project's goals and any limitations of the study's findings. The standards are flexible enough to accommodate a wide range of nuance, including private datasets and complex hardware configurations, according to the authors.

While the increased rigor of these new standards might slow the publication of any given study, the authors believe wide adoption of these standards would increase the overall rate of discovery and innovation, potentially by a lot.

"What we ultimately care about is the pace of scientific progress," said sociologist Emily Cantrell, one of the lead authors, who is pursuing her Ph.D. at Princeton. "By making sure the papers that get published are of high quality and that they're a solid base for future papers to build on, that potentially then speeds up the pace of scientific progress. Focusing on scientific progress itself and not just getting papers out the door is really where our emphasis should be."

Kapoor concurred. The errors hurt. "At the collective level, it's just a major time sink," he said. That time costs money. And that money, once wasted, could have catastrophic downstream effects, limiting the kinds of science that attract funding and investment, tanking ventures that are inadvertently built on faulty science, and discouraging countless numbers of young researchers.

In working toward a consensus about what should be included in the guidelines, the authors said they aimed to strike a balance: simple enough to be widely adopted, comprehensive enough to catch as many common mistakes as possible.

They say researchers could adopt the standards to improve their own work; peer reviewers could use the checklist to assess papers; and journals could adopt the standards as a requirement for publication.

"The scientific literature, especially in applied machine learning research, is full of avoidable errors," Narayanan said. "And we want to help people. We want to keep honest people honest."

  • Computer Science
  • Computers and Internet
  • Computer Modeling
  • Neural Interfaces
  • Educational Technology
  • Alan Turing
  • Data mining
  • Artificial intelligence
  • John von Neumann
  • Computer and video game genres
  • Security engineering
  • Scientific method

Story Source:

Materials provided by Princeton University, Engineering School . Original written by Scott Lyon. Note: Content may be edited for style and length.

Journal Reference :

  • Sayash Kapoor, Emily M. Cantrell, Kenny Peng, Thanh Hien Pham, Christopher A. Bail, Odd Erik Gundersen, Jake M. Hofman, Jessica Hullman, Michael A. Lones, Momin M. Malik, Priyanka Nanayakkara, Russell A. Poldrack, Inioluwa Deborah Raji, Michael Roberts, Matthew J. Salganik, Marta Serra-Garcia, Brandon M. Stewart, Gilles Vandewiele, Arvind Narayanan. REFORMS: Consensus-based Recommendations for Machine-learning-based Science . Science Advances , 2024; 10 (18) DOI: 10.1126/sciadv.adk3452

Cite This Page :

Explore More

  • Anticoagulant With an On-Off Switch
  • Sleep Resets Brain Connections -- At First
  • Far-Reaching Effects of Exercise
  • Hidden Connections Between Brain and Body
  • Novel Genetic Plant Regeneration Approach
  • Early Human Occupation of China
  • Journey of Inhaled Plastic Particle Pollution
  • Earth-Like Environment On Ancient Mars
  • A 'Cosmic Glitch' in Gravity
  • Time Zones Strongly Influence NBA Results

Trending Topics

Strange & offbeat.

MIT Technology Review

  • Newsletters

Too many AI researchers think real-world problems are not relevant

  • Hannah Kerner archive page

bad benchmarks

Any researcher who’s focused on applying machine learning to real-world problems has likely received a response like this one: “The authors present a solution for an original and highly motivating problem, but it is an application and the significance seems limited for the machine-learning community.”

These words are straight from a review I received for a paper I submitted to the NeurIPS (Neural Information Processing Systems) conference , a top venue for machine-learning research. I’ve seen the refrain time and again in reviews of papers where my coauthors and I presented a method motivated by an application, and I’ve heard similar stories from countless others.

This makes me wonder: If the community feels that aiming to solve high-impact real-world problems with machine learning is of limited significance, then what are we trying to achieve?

The goal of artificial intelligence (pdf) is to push forward the frontier of machine intelligence. In the field of machine learning, a novel development usually means a new algorithm or procedure, or—in the case of deep learning—a new network architecture. As others have pointed out, this hyperfocus on novel methods leads to a scourge of papers that report marginal or incremental improvements on benchmark data sets and exhibit flawed scholarship (pdf) as researchers race to top the leaderboard .

Meanwhile, many papers that describe new applications present both novel concepts and high-impact results. But even a hint of the word “application” seems to spoil the paper for reviewers. As a result, such research is marginalized at major conferences. Their authors’ only real hope is to have their papers accepted in workshops, which rarely get the same attention from the community.

This is a problem because machine learning holds great promise for advancing health, agriculture, scientific discovery, and more. The first image of a black hole was produced using machine learning. The most accurate predictions of protein structures , an important step for drug discovery, are made using machine learning. If others in the field had prioritized real-world applications, what other groundbreaking discoveries would we have made by now?

This is not a new revelation. To quote a classic paper titled “ Machine Learning that Matters” (pdf) , by NASA computer scientist Kiri Wagstaf f : “Much of current machine learning research has lost its connection to problems of import to the larger world of science and society.” The same year that Wagstaff published her paper, a convolutional neural network called AlexNet won a high-profile competition for image recognition centered on the popular ImageNet data set, leading to an explosion of interest in deep learning . Unfortunately, the disconnect she described appears to have grown even worse since then.

The wrong questions

Marginalizing applications research has real consequences. Benchmark data sets, such as ImageNet or COCO , have been key to advancing machine learning. They enable algorithms to train and be compared on the same data. However, these data sets contain biases that can get built into the resulting models.

More than half of the images in ImageNet (pdf) come from the US and Great Britain, for example. That imbalance leads systems to inaccurately classify images in categories that differ by geography (pdf) . Popular face data sets, such as the AT&T Database of Faces , contain primarily light-skinned male subjects, which leads to systems that struggle to recognize dark-skinned and female faces .

While researchers try to outdo one another on contrived benchmarks, one in every nine people in the world is starving.

When studies on real-world applications of machine learning are excluded from the mainstream, it’s difficult for researchers to see the impact of their biased models, making it far less likely that they will work to solve these problems.

One reason applications research is minimized might be that others in machine learning think this work consists of simply applying methods that already exist. In reality, though, adapting machine-learning tools to specific real-world problems takes significant algorithmic and engineering work. Machine-learning researchers who fail to realize this and expect tools to work “off the shelf” often wind up creating ineffective models. Either they evaluate a model’s performance using metrics that don’t translate to real-world impact, or they choose the wrong target altogether.

For example, most studies applying deep learning to echocardiogram analysis try to surpass a physician’s ability to predict disease. But predicting normal heart function (pdf) would actually save cardiologists more time by identifying patients who do not need their expertise. Many studies applying machine learning to viticulture aim to optimize grape yields (pdf) , but winemakers “want the right levels of sugar and acid, not just lots of big watery berries,” says Drake Whitcraft of Whitcraft Winery in California.

More harm than good

Another reason applications research should matter to mainstream machine learning is that the field’s benchmark data sets are woefully out of touch with reality.

New machine-learning models are measured against large, curated data sets that lack noise and have well-defined, explicitly labeled categories (cat, dog, bird). Deep learning does well for these problems because it assumes a largely stable world (pdf) .

But in the real world, these categories are constantly changing over time or according to geographic and cultural context. Unfortunately, the response has not been to develop new methods that address the difficulties of real-world data; rather, there’s been a push for applications researchers to create their own benchmark data sets.

The goal of these efforts is essentially to squeeze real-world problems into the paradigm that other machine-learning researchers use to measure performance. But the domain-specific data sets are likely to be no better than existing versions at representing real-world scenarios. The results could do more harm than good. People who might have been helped by these researchers’ work will become disillusioned by technologies that perform poorly when it matters most.

Because of the field’s misguided priorities, people who are trying to solve the world’s biggest challenges are not benefiting as much as they could from AI’s very real promise. While researchers try to outdo one another on contrived benchmarks, one in every nine people in the world is starving . Earth is warming and sea level is rising at an alarming rate.

As neuroscientist and AI thought leader Gary Marcus once wrote (pdf) : “AI’s greatest contributions to society … could and should ultimately come in domains like automated scientific discovery, leading among other things towards vastly more sophisticated versions of medicine than are currently possible. But to get there we need to make sure that the field as whole doesn’t first get stuck in a local minimum.”

For the world to benefit from machine learning, the community must again ask itself, as Wagstaff once put it: “What is the field’s objective function?” If the answer is to have a positive impact in the world, we must change the way we think about applications.

Artificial intelligence

Large language models can do jaw-dropping things. but nobody knows exactly why..

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

  • Will Douglas Heaven archive page

What’s next for generative video

OpenAI's Sora has raised the bar for AI moviemaking. Here are four things to bear in mind as we wrap our heads around what's coming.

The AI Act is done. Here’s what will (and won’t) change

The hard work starts now.

  • Melissa Heikkilä archive page

Is robotics about to have its own ChatGPT moment?

Researchers are using generative AI and other techniques to teach robots new skills—including tasks they could perform in homes.

Stay connected

Get the latest updates from mit technology review.

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at [email protected] with a list of newsletters you’d like to receive.

Princeton University

Princeton engineering, science has an ai problem. this group says they can fix it..

By Scott Lyon

May 1, 2024

Illustrated team of scientists around a table with data visualized on the wall.

Researchers recommend 32 best practices to stamp out a smoldering crisis that threatens to engulf all of science: thousands of AI-driven claims across dozens of fields that cannot be reproduced. Illustration courtesy Adobe Stock

AI holds the potential to help doctors find early markers of disease and policymakers to avoid decisions that lead to war. But a growing body of evidence has revealed deep flaws in how machine learning is used in science, a problem that has swept through dozens of fields and implicated thousands of erroneous papers.

Now an interdisciplinary team of 19 researchers, led by Princeton University computer scientists Arvind Narayanan and Sayash Kapoor, has published guidelines for the responsible use of machine learning in science.

“When we graduate from traditional statistical methods to machine learning methods, there are a vastly greater number of ways to shoot oneself in the foot,” said Narayanan , director of Princeton’s Center for Information Technology Policy and a professor of computer science . “If we don’t have an intervention to improve our scientific standards and reporting standards when it comes to machine learning-based science, we risk not just one discipline but many different scientific disciplines rediscovering these crises one after another.”

The authors say their work is an effort to stamp out this smoldering crisis of credibility that threatens to engulf nearly every corner of the research enterprise. A paper detailing their guidelines appeared May 1 in the journal Science Advances .

Because machine learning has been adopted across virtually every scientific discipline, with no universal standards safeguarding the integrity of those methods, Narayanan said the current crisis, which he calls the reproducibility crisis , could become far more serious than the replication crisis that emerged in social psychology more than a decade ago.

The good news is that a simple set of best practices can help resolve this newer crisis before it gets out of hand, according to the authors, who come from computer science, mathematics, social science and health research.

“This is a systematic problem with systematic solutions,” said Kapoor , a graduate student who works with Narayanan and who organized the effort to produce the new consensus-based checklist.

The checklist focuses on ensuring the integrity of research that uses machine learning. Science depends on the ability to independently reproduce results and validate claims. Otherwise, new work cannot be reliably built atop old work, and the entire enterprise collapses. While other researchers have developed checklists that apply to discipline-specific problems, notably in medicine, the new guidelines start with the underlying methods and apply them to any quantitative discipline.

One of the main takeaways is transparency. The checklist calls on researchers to provide detailed descriptions of each machine learning model, including the code, the data used to train and test the model, the hardware specifications used to produce the results, the experimental design, the project’s goals and any limitations of the study’s findings. The standards are flexible enough to accommodate a wide range of nuance, including private datasets and complex hardware configurations, according to the authors.

While the increased rigor of these new standards might slow the publication of any given study, the authors believe wide adoption of these standards would increase the overall rate of discovery and innovation, potentially by a lot.

“What we ultimately care about is the pace of scientific progress,” said sociologist Emily Cantrell , one of the lead authors, who is pursuing her Ph.D. at Princeton. “By making sure the papers that get published are of high quality and that they’re a solid base for future papers to build on, that potentially then speeds up the pace of scientific progress. Focusing on scientific progress itself and not just getting papers out the door is really where our emphasis should be.”

Kapoor concurred. The errors hurt. “At the collective level, it’s just a major time sink,” he said. That time costs money. And that money, once wasted, could have catastrophic downstream effects, limiting the kinds of science that attract funding and investment, tanking ventures that are inadvertently built on faulty science, and discouraging countless numbers of young researchers.

In working toward a consensus about what should be included in the guidelines, the authors said they aimed to strike a balance: simple enough to be widely adopted, comprehensive enough to catch as many common mistakes as possible.

They say researchers could adopt the standards to improve their own work; peer reviewers could use the checklist to assess papers; and journals could adopt the standards as a requirement for publication.

“The scientific literature, especially in applied machine learning research, is full of avoidable errors,” Narayanan said. “And we want to help people. We want to keep honest people honest.”

The paper, “ Consensus-based recommendations for machine-learning-based science ,” published on May 1 in Science Advances, included the following authors:

Sayash Kapoor, Princeton University; Emily Cantrell, Princeton University; Kenny Peng, Cornell University; Thanh Hien (Hien) Pham, Princeton University; Christopher A. Bail, Duke University; Odd Erik Gundersen, Norwegian University of Science and Technology; Jake M. Hofman, Microsoft Research; Jessica Hullman, Northwestern University; Michael A. Lones, Heriot-Watt University; Momin M. Malik, Center for Digital Health, Mayo Clinic; Priyanka Nanayakkara, Northwestern; Russell A. Poldrack, Stanford University; Inioluwa Deborah Raji, University of California-Berkeley; Michael Roberts, University of Cambridge; Matthew J. Salganik, Princeton University; Marta Serra-Garcia, University of California-San Diego; Brandon M. Stewart, Princeton University; Gilles Vandewiele, Ghent University; and Arvind Narayanan, Princeton University.

Related News

Avi Wigderson attending a lecture.

Grad alum Avi Wigderson wins Turing Award for groundbreaking insights in computer science

A figure wearing holographic displays glasses, a chip on the leg of the eyeglasses beaming colored light onto the inside of the lens of the glasses.

Holographic displays offer a glimpse into an immersive future

Pedestrians crossing the street in New York City.

Retro-reflectors could help future cities keep their cool

Computer simulation graphic showing hundreds of thousands of atoms in two planes, representing two surfaces, with an abstract web-like channel showing how charge carriers move between the surfaces.

The science of static shock jolted into the 21st century

Smiling man wearing a dark suit jacket

Justice Department designates Mayer to serve as first chief science and technology adviser and chief AI officer

An advanced chip taped out surrounded by a gold square surrounded by a large array of gold pins.

Built for AI, this chip moves beyond transistors for huge computational gains

current research problems in machine learning

Arvind Narayanan

current research problems in machine learning

Data Science

Related departments and centers.

Computer Science

Computer Science

current research problems in machine learning

Center for Information Technology Policy

Help | Advanced Search

Computer Science > Machine Learning

Title: gansemble for small and imbalanced data sets: a baseline for synthetic microplastics data.

Abstract: Microplastic particle ingestion or inhalation by humans is a problem of growing concern. Unfortunately, current research methods that use machine learning to understand their potential harms are obstructed by a lack of available data. Deep learning techniques in particular are challenged by such domains where only small or imbalanced data sets are available. Overcoming this challenge often involves oversampling underrepresented classes or augmenting the existing data to improve model performance. This paper proposes GANsemble: a two-module framework connecting data augmentation with conditional generative adversarial networks (cGANs) to generate class-conditioned synthetic data. First, the data chooser module automates augmentation strategy selection by searching for the best data augmentation strategy. Next, the cGAN module uses this strategy to train a cGAN for generating enhanced synthetic data. We experiment with the GANsemble framework on a small and imbalanced microplastics data set. A Microplastic-cGAN (MPcGAN) algorithm is introduced, and baselines for synthetic microplastics (SYMP) data are established in terms of Frechet Inception Distance (FID) and Inception Scores (IS). We also provide a synthetic microplastics filter (SYMP-Filter) algorithm to increase the quality of generated SYMP. Additionally, we show the best amount of oversampling with augmentation to fix class imbalance in small microplastics data sets. To our knowledge, this study is the first application of generative AI to synthetically create microplastics data.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues

  • Review article
  • Published: 29 March 2024

Cite this article

current research problems in machine learning

  • Kewei Bian 1 &
  • Rahul Priyadarshi   ORCID: orcid.org/0000-0001-5725-9812 2  

200 Accesses

Explore all metrics

Optimization approaches in machine learning (ML) are essential for training models to obtain high performance across numerous domains. The article provides a comprehensive overview of ML optimization strategies, emphasizing their classification, obstacles, and potential areas for further study. We proceed with studying the historical progression of optimization methods, emphasizing significant developments and their influence on contemporary algorithms. We analyse the present research to identify widespread optimization algorithms and their uses in supervised learning, unsupervised learning, and reinforcement learning. Various common optimization constraints, including non-convexity, scalability issues, convergence problems, and concerns about robustness and generalization, are also explored. We suggest future research should focus on scalability problems, innovative optimization techniques, domain knowledge integration, and improving interpretability. The present study aims to provide an in-depth review of ML optimization by combining insights from historical advancements, literature evaluations, and current issues to guide future research efforts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA) Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

current research problems in machine learning

Similar content being viewed by others

current research problems in machine learning

What Is Machine Learning?

current research problems in machine learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

current research problems in machine learning

A survey on semi-supervised learning

Adams R (2013) Active Queue Management: a Survey. IEEE Commun Surv Tutorials 15(3):1425–1476. https://doi.org/10.1109/SURV.2012.082212.00018

Article   Google Scholar  

Alsheikh M, Abu S, Lin D, Niyato, and Hwee Pink Tan (2014) Machine learning in Wireless Sensor networks: algorithms, strategies, and applications. IEEE Commun Surv Tutorials 16(4):1996–2018. https://doi.org/10.1109/COMST.2014.2320099

Anurag A, Priyadarshi R, Goel A, Gupta B (2020) 2-D Coverage Optimization in WSN Using a Novel Variant of Particle Swarm Optimisation. In 2020 7th International Conference on Signal Processing and Integrated Networks, SPIN 2020, 663–68. https://doi.org/10.1109/SPIN48934.2020.9070978

Badarla V, Siva Ram Murthy C (2010) A novel learning based solution for Efficient Data Transport in Heterogeneous Wireless Networks. Wireless Netw 16(6):1777–1798. https://doi.org/10.1007/s11276-009-0228-4

Priyadarshi R, Gupta B, and Amulya Anurag (2020) Wireless Sensor Networks Deployment: a result oriented analysis. Wireless Pers Commun 113(2):843–866. https://doi.org/10.1007/s11277-020-07255-9

Auld T, Moore AW, Gull SF (2007) Bayesian neural networks for internet traffic classification. IEEE Trans Neural Networks 18(1):223–239. https://doi.org/10.1109/TNN.2006.883010

Priyadarshi R, Gupta B, and Amulya Anurag (2020) Deployment techniques in Wireless Sensor networks: a Survey, classification, challenges, and Future Research Issues. J Supercomputing 76(9):7333–7373. https://doi.org/10.1007/s11227-020-03166-5

Priyadarshi R (2021) and Ravi Ranjan Kumar. An Energy-Efficient LEACH Routing Protocol for Wireless Sensor Networks. In Lecture Notes in Electrical Engineering, edited by Vijay Nath and J K Mandal, 673:423–30. Singapore: Springer Singapore. https://doi.org/10.1007/978-981-15-5546-6_35

Ayoubi S, Limam N, Salahuddin MA, Shahriar N, Boutaba R, Estrada-Solano F, Caicedo OM (2018) Machine Learning for Cognitive Network Management. IEEE Commun Mag 56(1):158–165. https://doi.org/10.1109/MCOM.2018.1700560

Priyadarshi R, Nath V (2019) A Novel Diamond–Hexagon Search Algorithm for Motion Estimation. Microsyst Technol 25(12):4587–4591. https://doi.org/10.1007/s00542-019-04376-5

Rosenblatt F (1960) Perceptron simulation experiments. Proceedings of the IRE 48.3:301–309

Werbos PJ (1994) The roots of backpropagation: from ordered derivatives to neural networks and political forecasting, vol 1. Wiley

Nouretdinov I et al (2011) Machine learning classification with confidence: application of transductive conformal predictors to MRI-based diagnostic and prognostic markers in depression. NeuroImage 56(2):809–813

Vandenberghe L, Boyd S (1996) Semidefinite programming. SIAM Rev 38(1):49–95

Article   MathSciNet   Google Scholar  

LeCun Y, Bengio Y, Hinton G (2015) Deep Learn Nat 521:436–444

Google Scholar  

Priyadarshi R, Rana H, Srivastava A, Nath V (2023) A Novel Approach for Sink Route in Wireless Sensor Network. In Lecture Notes in Electrical Engineering, edited by Vijay Nath and Jyotsna Kumar Mandal, 887:695–703. Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-19-1906-0_58

Bkassiny M, Li Y, Jayaweera SK (2013) A Survey on Machine-Learning techniques in Cognitive Radios. IEEE Commun Surv Tutorials 15(3):1136–1159. https://doi.org/10.1109/SURV.2012.100412.00017

Qiu Y, Ma L, and Rahul Priyadarshi (2024) Deep Learning challenges and prospects in Wireless Sensor Network Deployment. Arch Comput Methods Eng. https://doi.org/10.1007/s11831-024-10079-6

Chabaa S, Zeroual A, and Jilali Antari (2010) Identification and prediction of internet traffic using Artificial neural networks. J Intell Learn Syst Appl 02(03):147–155. https://doi.org/10.4236/jilsa.2010.23018

Chang C, Chung, Chih Jen Lin (2011) LIBSVM: a Library for Support Vector machines. ACM Trans Intell Syst Technol 2(3). https://doi.org/10.1145/1961189.1961199

Claeys M, Latre S, Famaey J, and Filip De Turck (2014) Design and evaluation of a self-learning http adaptive video streaming client. IEEE Commun Lett 18(4):716–719. https://doi.org/10.1109/LCOMM.2014.020414.132649

Claeys M, Latré S, Famaey J, Wu T, Van Leekwijck W, and Filip De Turck (2014) Design and optimisation of a (FA)Q-Learning-based HTTP adaptive streaming client. Connection Sci 26(1):25–43. https://doi.org/10.1080/09540091.2014.885273

Randheer SK, Soni S, Kumar, and Rahul Priyadarshi (2020). Energy-Aware Clustering in Wireless Sensor Networks BT - Nanoelectronics, Circuits and Communication Systems. In, edited by Vijay Nath and J K, Mandal 453–61. Singapore: Springer Singapore

Dowling J, Curran E, Cunningham R, and Vinny Cahill (2005) Using feedback in collaborative reinforcement learning to adaptively optimize MANET Routing. IEEE Trans Syst Man Cybernetics Part A:Systems Hum 35(3):360–372. https://doi.org/10.1109/TSMCA.2005.846390

Priyadarshi R, Gupta B (2023) 2-D Coverage optimization in obstacle-based FOI in WSN using modified PSO. J Supercomputing 79(5):4847–4869. https://doi.org/10.1007/s11227-022-04832-6

Edalat Y, Ahn JS, and Katia Obraczka (2016) Smart experts for Network State Estimation. IEEE Trans Netw Serv Manage 13(3):622–635. https://doi.org/10.1109/TNSM.2016.2586506

Este A, Gringoli F, and Luca Salgarelli (2009) Support Vector machines for TCP Traffic classification. Comput Netw 53(14):2476–2490. https://doi.org/10.1016/j.comnet.2009.05.003

Rawat P, Chauhan S, Priyadarshi R (2021) A novel heterogeneous clustering protocol for lifetime maximization of Wireless Sensor Network. Wireless Pers Commun 117(2):825–841. https://doi.org/10.1007/s11277-020-07898-8

García-Teodoro P, Díaz-Verdejo J, Maciá-Fernández G, and E. Vázquez (2009) Anomaly-based network intrusion detection: techniques, systems and challenges. Computers Secur 28(1–2):18–28. https://doi.org/10.1016/j.cose.2008.08.003

Priyadarshi R, and Bharat Gupta (2021) Area Coverage optimization in three-Dimensional Wireless Sensor Network. Wireless Pers Commun 117(2):843–865. https://doi.org/10.1007/s11277-020-07899-7

Yin, F., Lin, Z., Kong, Q., Xu, Y., Li, D., Theodoridis, S.,… Cui, S. R. (2020). FedLoc:Federated Learning Framework for Data-Driven Cooperative Localization and Location Data Processing. IEEE Open Journal of Signal Processing, 1:187–215. https://doi.org/10.1109/OJSP.2020.3036276

Yin F, Fritsche C, Jin D, Gustafsson F, Zoubir AM (2015) Cooperative localization in WSNs using Gaussian Mixture modeling: distributed ECM algorithms. IEEE Trans Signal Process 63(6):1448–1463. https://doi.org/10.1109/TSP.2015.2394300

Xu G, Zhang Q, Song Z, Ai B (2023) Relay-assisted Deep Space Optical Communication System over coronal fading channels. IEEE Trans Aerosp Electron Syst 59(6):8297–8312. https://doi.org/10.1109/TAES.2023.3301463

Yan, A., Li, Z., Gao, Z., Zhang, J., Huang, Z., Ni, T.,… Wen, X. (2024). MURLAV: A Multiple-Node-Upset Recovery Latch and Algorithm-Based Verification Method. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. https://doi.org/10.1109/TCAD.2024.3357593

Yan, A., Cao, A., Huang, Z., Cui, J., Ni, T., Girard, P.,… Zhang, J. (2023). Two Double-Node-Upset-Hardened Flip-Flop Designs for High-Performance Applications. IEEE Transactions on Emerging Topics in Computing, 11(4):1070–1081. https://doi.org/10.1109/TETC.2023.3317070

Dai M, Luo L, Ren J, Yu H, Sun G (2022) PSACCF: prioritized online slice Admission Control considering Fairness in 5G/B5G networks. IEEE Trans Netw Sci Eng 9(6):4101–4114. https://doi.org/10.1109/TNSE.2022.3195862

Sun G, Xu Z, Yu H, Chang V (2021) Dynamic network function provisioning to Enable Network in Box for Industrial Applications. IEEE Trans Industr Inf 17(10):7155–7164. https://doi.org/10.1109/TII.2020.3042872

Sun, G., Zhu, G., Liao, D., Yu, H., Du, X.,… Guizani, M. (2019). Cost-Efficient Service Function Chain Orchestration for Low-Latency Applications in NFV Networks. IEEE Systems Journal, 13(4):3877–3888. https://doi.org/10.1109/JSYST.2018.2879883

Ma X, Dong Z, Quan W, Dong Y, Tan Y (2023) Real-time assessment of asphalt pavement moduli and traffic loads using monitoring data from Built-in Sensors: Optimal sensor placement and identification algorithm.Mech Syst Signal Process 187:109930. https://doi.org/10.1016/j.ymssp.2022.109930

Qu J, Mao B, Li Z, Xu Y, Zhou K, Cao X, Wang X (2023) Recent progress in Advanced Tactile Sensing technologies for Soft Grippers. Adv Funct Mater 33(41):2306249. https://doi.org/10.1002/adfm.202306249

Priyadarshi R, Bhardwaj P, Gupta P, and Vijay Nath (2023) Utilization of smartphone-based Wireless sensors in Agricultural Science: a state of art. Lecture Notes Electr Eng 887:681–688. https://doi.org/10.1007/978-981-19-1906-0_56

Li R, Peng B (2022) Implementing Monocular Visual-Tactile sensors for Robust Manipulation. Cyborg Bionic Syst 2022. https://doi.org/10.34133/2022/9797562

Aibin Y, Feng X, Zhao X, Zhou H, Cui J, Ying Z, Girard P, Wen X HITTSFL: Design of a Cost-Effective HIS-Insensitive TNU-Tolerant and SET-Filtering Latch for Safety-Critical Applications, IEEE/ACM Design Automation Conference (DAC2020), Oral, pp. 1–6, 2020/7/19–23, San Francisco, USA

J., X., S., H. P., X., Z., & J., H. (2022) The improvement of Road Driving Safety guided by visual Inattentional blindness. IEEE Trans Intell Transp Syst, 23(6):4972–4981. https://doi.org/10.1109/TITS.2020.3044927

Priyadarshi R, and Bharat Gupta (2020) Coverage Area Enhancement in Wireless Sensor Network. Microsyst Technol 26(5):1417–1426. https://doi.org/10.1007/s00542-019-04674-y

Dai X, Xiao Z, Jiang H, Alazab M, Lui JCS, Dustdar S, Liu J (2023) Task Co-offloading for D2D-Assisted Mobile Edge Computing in Industrial Internet of things. IEEE Trans Industr Inf 19(1):480–490. https://doi.org/10.1109/TII.2022.3158974

Jiang H, Dai X, Xiao Z, Iyengar AK (2022) Joint Task Offloading and Resource Allocation for Energy-Constrained Mobile Edge Computing. IEEE Trans Mob Comput. https://doi.org/10.1109/TMC.2022.3150432

Dai X, Xiao Z, Jiang H, Lui JCS (2023) UAV-Assisted Task Offloading in Vehicular Edge Computing Networks. IEEE Trans Mob Comput. https://doi.org/10.1109/TMC.2023.3259394

Sun L, Liang J, Zhang C, Wu D, Zhang Y (2023) Meta-transfer Metric Learning for Time Series classification in 6G-Supported Intelligent Transportation systems. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2023.3250962

Mao Y, Sun R, Wang J, Cheng Q, Kiong L, Ochieng C, Y. W (2022) New time-differenced carrier phase approach to GNSS/INS integration. GPS Solutions 26(4):122. https://doi.org/10.1007/s10291-022-01314-3

Mao Y, Zhu Y, Tang Z, Chen Z (2022) A Novel Airspace Planning Algorithm for Cooperative Target localization. Electronics 11(18):2950. https://doi.org/10.3390/electronics11182950

Xie Y, Wang X, Shen Z, Sheng Y, Wu G (2023) A two-stage estimation of distribution Algorithm with Heuristics for Energy-Aware Cloud Workflow Scheduling. IEEE Trans Serv Comput 16(6):4183–4197. https://doi.org/10.1109/TSC.2023.3311785

Shang M, Luo J (2021) The Tapio Decoupling Principle and Key strategies for changing factors of Chinese urban Carbon Footprint based on Cloud Computing. Int J Environ Res Public Health 18(4):2101. https://doi.org/10.3390/ijerph18042101

Luo J, Zhao C, Chen Q, Li G (2022) Using deep belief network to construct the agricultural information system based on internet of things. J Supercomputing 78(1):379–405. https://doi.org/10.1007/s11227-021-03898-y

Cao B, Zhao J, Yang P, Gu Y, Muhammad K, Rodrigues J, C P J, V de Albuquerque, C H (2020) Multiobjective 3-D Topology Optimization of Next-Generation Wireless Data Center Network. IEEE Trans Industr Inf 16(5):3597–3605. https://doi.org/10.1109/TII.2019.2952565

Yu J, Lu L, Chen Y, Zhu Y, Kong L (2021) An indirect eavesdropping attack of keystrokes on Touch screen through Acoustic Sensing. IEEE Trans Mob Comput 20(2):337–351. https://doi.org/10.1109/TMC.2019.2947468

Li K, Ji L, Yang S, Li H, Liao X (2022) Couple-Group Consensus of Cooperative–competitive heterogeneous Multiagent systems: a fully distributed event-triggered and Pinning Control Method. IEEE Trans Cybernetics 52(6):4907–4915. https://doi.org/10.1109/TCYB.2020.3024551

Min H, Lei X, Wu X, Fang Y, Chen S, Wang W, Zhao X (2024) Toward interpretable anomaly detection for autonomous vehicles with denoising variational transformer. Eng Appl Artif Intell 129:107601. https://doi.org/10.1016/j.engappai.2023.107601

Hou X, Zhang L, Su Y, Gao G, Liu Y, Na Z, Chen T (2023) A space crawling robotic bio-paw (SCRBP) enabled by triboelectric sensors for surface identification. Nano Energy 105:108013. https://doi.org/10.1016/j.nanoen.2022.108013

Hou X, Xin L, Fu Y, Na Z, Gao G, Liu Y, Chen T (2023) A self-powered biomimetic mouse whisker sensor (BMWS) aiming at terrestrial and space objects perception. Nano Energy 118:109034. https://doi.org/10.1016/j.nanoen.2023.109034

Liang X, Chen Z, Deng Y, Liu D, Liu X, Huang Q, Arai T (2023) Field-controlled microrobots fabricated by Photopolymerization. Cyborg Bionic Syst 4:9. https://doi.org/10.34133/cbsystems.0009

Ma S, Chen Y, Yang S, Liu S, Tang L, Li B, Li Y (2023) The Autonomous Pipeline Navigation of a Cockroach Bio-robot with enhanced walking stimuli. Cyborg Bionic Syst 4:67. https://doi.org/10.34133/cbsystems.0067

Cai Z, Zhu X, Gergondet P, Chen X, Yu Z (2023) A friction-driven strategy for Agile Steering Wheel Manipulation by Humanoid Robots. Cyborg Bionic Syst 4:64. https://doi.org/10.34133/cbsystems.0064

Li X, Sun Y (2021) Application of RBF neural network optimal segmentation algorithm in credit rating. Neural Comput Appl 33(14):8227–8235. https://doi.org/10.1007/s00521-020-04958-9

Long X, Mao M, Su T, Su Y, Tian M (2023) Machine learning method to predict dynamic compressive response of concrete-like material at high strain rates. Def Technol 23:100–111. https://doi.org/10.1016/j.dt.2022.02.003

Long X, Lu C, Su Y, Dai Y (2023) Machine learning framework for predicting the low cycle fatigue life of lead-free solders. Eng Fail Anal 148:107228. https://doi.org/10.1016/j.engfailanal.2023.107228

Hu J, Wu Y, Li T, Ghosh BK (2019) Consensus Control of General Linear Multiagent Systems with Antagonistic Interactions and communication noises. IEEE Trans Autom Control 64(5):2122–2127. https://doi.org/10.1109/TAC.2018.2872197

Chen B, Hu J, Zhao Y, Ghosh BK (2022) Finite-Time velocity-free Rendezvous Control of multiple AUV Systems with Intermittent Communication. IEEE Trans Syst Man Cybernetics: Syst 52(10):6618–6629. https://doi.org/10.1109/TSMC.2022.3148295

Bo C, Jiangping H, Bijoy G (2023) Finite-Time Observer Based Tracking Control of Heterogeneous Multi-AUV Systems with Partial Measurements and Intermittent Communication. Science China Information Sciences. https://doi.org/10.1007/s11432-023-3903-6

Jiang Y, Li X (2022) Broadband cancellation method in an adaptive co-site interference cancellation system. Int J Electron 109(5):854–874. https://doi.org/10.1080/00207217.2021.1941295

Zhang, X., Deng, H., Xiong, Z., Liu, Y., Rao, Y., Lyu, Y.,… Li, Y. (2024). Secure Routing Strategy Based on Attribute-Based Trust Access Control in Social-Aware Networks.Journal of Signal Processing Systems. https://doi.org/10.1007/s11265-023-01908-1

Lyu T, Xu H, Zhang L, Han Z (2024) Source selection and resource allocation in Wireless-Powered Relay networks: an adaptive dynamic programming-based Approach. IEEE Internet Things J 11(5):8973–8988. https://doi.org/10.1109/JIOT.2023.3321673

Liu G (April 2021) Data Collection in MI-Assisted Wireless Powered Underground Sensor networks: directions, recent advances, and challenges. IEEE Commun Mag 59(4):132–138. https://doi.org/10.1109/MCOM.001.2000921

Zhao L, Qu S, Xu H, Wei Z, Zhang C (2024) Energy-efficient trajectory design for secure SWIPT systems assisted by UAV-IRS. Veh Commun 45:100725. https://doi.org/10.1016/j.vehcom.2023.100725

Hou M, Zhao Y, Ge X (2017) Optimal scheduling of the plug-in electric vehicles aggregator energy and regulation services based on grid to vehicle. Int Trans Electr Energy Syst 27(6):e2364. https://doi.org/10.1002/etep.2364

Lei Y, Yanrong C, Hai T, Ren G, Wenhuan W (2023) DGNet: an adaptive lightweight defect detection model for New Energy Vehicle Battery Current Collector. IEEE Sens J 23(23):29815–29830. https://doi.org/10.1109/JSEN.2023.3324441

Xu Y, Wang E, Yang Y, Chang Y (2022) A unified collaborative representation learning for neural-network based Recommender systems. IEEE Trans Knowl Data Eng 34(11):5126–5139. https://doi.org/10.1109/TKDE.2021.3054782

Liu X, Lou S, Dai W (2023) Further results on System identification of nonlinear state-space models. Automatica 148:110760. https://doi.org/10.1016/j.automatica.2022.110760

Wang Q, Dai W, Zhang C, Zhu J, Ma X (2023) A Compact Constraint Incremental Method for Random Weight Networks and its application. IEEE transactions on neural networks and Learning systems. https://doi.org/10.1109/TNNLS.2023.3289798

Zhang, H., Mi, Y., Liu, X., Zhang, Y., Wang, J.,… Tan, J. (2023). A differential game approach for real-time security defense decision in scale-free networks. Computer Networks, 224, 109635. https://doi.org/10.1016/j.comnet.2023.109635

Cao K, Ding H, Li W, Lv L, Gao M, Gong F, Wang B (2022) On the Ergodic Secrecy Capacity of Intelligent reflecting surface aided Wireless Powered Communication systems. IEEE Wirel Commun Lett PP(1). https://doi.org/10.1109/LWC.2022.3199593

Cheng, B., Wang, M., Zhao, S., Zhai, Z., Zhu, D.,… Chen, J. (2017). Situation-Aware Dynamic Service Coordination in an IoT Environment. IEEE/ACM Transactions on Networking,25(4), 2082–2095. https://doi.org/10.1109/TNET.2017.2705239

Zheng, W., Lu, S., Cai, Z., Wang, R., Wang, L.,… Yin, L. (2023). PAL-BERT: An Improved Question Answering Model. Computer Modeling in Engineering & Sciences. https://doi.org/10.32604/cmes.2023.046692

Cao B, Li Z, Liu X, Lv Z, He H (2023) Mobility-aware Multiobjective Task Offloading for Vehicular Edge Computing in Digital Twin Environment. IEEE J Sel Areas Commun 41(10):3046–3055. https://doi.org/10.1109/JSAC.2023.3310100

Geurts P, Ernst D, and Louis Wehenkel (2006) Extremely randomized trees. Mach Learn 63(1):3–42. https://doi.org/10.1007/s10994-006-6226-1

Giacinto G, Perdisci R, Rio MD, and Fabio Roli (2008) Intrusion detection in computer networks by a modular ensemble of one-class classifiers. Inform Fusion 9(1):69–82. https://doi.org/10.1016/j.inffus.2006.10.002

Goldberger AS (2004) Econometric Computing by Hand. J Econ Soc Meas 29(1–3):115–117. https://doi.org/10.3233/jem-2004-0213

Ha S, Rhee I, Xu L (2008) CUBIC: a new TCP-Friendly high-speed TCP variant. Operating Syst Rev (ACM) 42(5):64–74. https://doi.org/10.1145/1400097.1400105

Hajji H (2005) Statistical Analysis of Network Traffic for Adaptive Faults Detection. IEEE Trans Neural Networks 16(5):1053–1063. https://doi.org/10.1109/TNN.2005.853414

Hariri B, Sadati N (2007) NN-RED: an AQM mechanism based on neural networks. Electron Lett 43(19):1053–1055. https://doi.org/10.1049/el:20071791

Hu T, and Yunsi Fei (2010) QELAR: a machine-learning-based adaptive routing protocol for energy-efficient and lifetime-extended underwater Sensor Networks. IEEE Trans Mob Comput 9(6):796–809. https://doi.org/10.1109/TMC.2010.28

Hu W, Wei Hu, and Steve Maybank (2008) AdaBoost-Based algorithm for Network Intrusion Detection. IEEE Trans Syst Man Cybernetics Part B: Cybernetics 38(2):577–583. https://doi.org/10.1109/TSMCB.2007.914695

Jain V, Randheer R, Priyadarshi, and Ankush Thakur (2019) Performance analysis of Block Matching algorithms. Lecture Notes Electr Eng 556:73–82 Springer Singapore. https://doi.org/10.1007/978-981-13-7091-5_7

Jayaraj A, Venkatesh T, Siva Ram C Murthy (2008) Loss classification in Optical Burst switching networks using machine learning techniques: improving the performance of TCP. IEEE J Sel Areas Commun 26(6):45–54. https://doi.org/10.1109/JSACOCN.2008.033508

Khanafer RM, Solana B, Triola J, Barco R, Moltsen L, Altman Z, Lázaro P (2008) Automated diagnosis for UMTS Networks using bayesian Network Approach. IEEE Trans Veh Technol 57(4):2451–2461. https://doi.org/10.1109/TVT.2007.912610

Kiciman E, and Armando Fox (2005) Detecting application-level failures in component-based internet services. IEEE Trans Neural Networks 16(5):1027–1041. https://doi.org/10.1109/TNN.2005.853411

Klaine P, Valente MA, Imran O, Onireti, and Richard Demo Souza (2017) A Survey of Machine Learning techniques Applied to Self-Organizing Cellular Networks. IEEE Commun Surv Tutorials 19(4):2392–2431. https://doi.org/10.1109/COMST.2017.2727878

Kumar S, Soni SK, Randheer, and Rahul Priyadarshi (2020) Performance Analysis of Novel Energy Aware Routing in Wireless Sensor Network. Lecture Notes Electr Eng 642:503–511 Springer Singapore. https://doi.org/10.1007/978-981-15-2854-5_44

Kumar S, Soni SK, Randheer (2020) and Rahul Priyadarshi. Performance Analysis of Novel Energy Aware Routing in Wireless Sensor Network. In Lecture Notes in Electrical Engineering, edited by Vijay Nath and J K Mandal, 642:503–11. Singapore: Springer Singapore. https://doi.org/10.1007/978-981-15-2854-5_44

Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-Learn: a Python Toolbox to tackle the curse of Imbalanced datasets in Machine Learning. J Mach Learn Res 18:1–5

Mirza M, Sommers J, Barford P, Zhu X (2010) A Machine Learning Approach to TCP Throughput Prediction. IEEE/ACM Trans Networking 18(4):1026–1039. https://doi.org/10.1109/TNET.2009.2037812

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236

Priyadarshi R, Yadav S, and Deepika Bilyan (2019) Performance and comparison analysis of MIEEP Routing Protocol over adapted LEACH Protocol. Smart Comput Strategies: Theoretical Practical Aspects 237–245. https://doi.org/10.1007/978-981-13-6295-8_20

Moustapha AI, and Rastko R. Selmic (2008) Wireless Sensor Network modeling using modified recurrent neural networks: application to Fault Detection. IEEE Trans Instrum Meas 57(5):981–988. https://doi.org/10.1109/TIM.2007.913803

Muniyandi A, Prabakar R, Rajeswari, Rajaram R (2012) Network Anomaly detection by cascading K-Means clustering and C4.5 decision Tree Algorithm. Procedia Eng 30:174–182. https://doi.org/10.1016/j.proeng.2012.01.849

Nguyen TTT, Armitage G, Philip Branch, and Sebastian Zander (2012) Timely and continuous machine-learning-based classification for interactive IP traffic. IEEE/ACM Trans Networking 20(6):1880–1894. https://doi.org/10.1109/tnet.2012.2187305

Nguyen TTT, and Grenville Armitage (2008) A survey of techniques for internet traffic classification using machine learning. IEEE Commun Surv Tutorials 10(4):56–76. https://doi.org/10.1109/SURV.2008.080406

Nichols K, and Van Jacobson (2012) Controlling Queue Delay. Queue 10(5):20–34. https://doi.org/10.1145/2208917.2209336

Nunes BA, Arouche K, Veenstra W, Ballenthin S, Lukin, Obraczka K (2014) A Machine Learning Framework for TCP Round-Trip Time Estimation. Eurasip Journal on Wireless Communications and Networking 2014. https://doi.org/10.1186/1687-1499-2014-47

Panda M, Abraham A, and Manas Ranjan Patra (2012) A hybrid Intelligent Approach for Network Intrusion Detection. Procedia Eng 30:1–9. https://doi.org/10.1016/j.proeng.2012.01.827

Pandey A, Kumar D, Priyadarshi R (2023) and Vijay Nath. Development of Smart Village for Better Lifestyle of Farmers by Crop and Health Monitoring System. In Lecture Notes in Electrical Engineering, edited by Vijay Nath and Jyotsna Kumar Mandal, 887:689–94. Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-19-1906-0_57

Pandey A, Kumar D, Priyadarshi R, and Vijay Nath (2023) Development of Smart Village for Better Lifestyle of Farmers by Crop and Health Monitoring System. Lecture Notes Electr Eng 887:689–694. https://doi.org/10.1007/978-981-19-1906-0_57 . Springer Nature Singapore Singapore

Peddabachigari S, Abraham A, Grosan C, and Johnson Thomas (2007) Modeling intrusion detection system using hybrid Intelligent systems. J Netw Comput Appl 30(1):114–132. https://doi.org/10.1016/j.jnca.2005.06.003

Pinson MH, Wolf S (2004) A new standardized method for objectively measuring Video Quality. IEEE Trans Broadcast 50(3):312–322. https://doi.org/10.1109/TBC.2004.834028

Priyadarshi R, Rawat P, and Vijay Nath (2019) Energy dependent cluster formation in heterogeneous Wireless Sensor Network. Microsyst Technol 25(6):2313–2321. https://doi.org/10.1007/s00542-018-4116-7

Jiang H, Luo Y, Zhang QY, Yin MY, and Chun Wu (2017) TCP-Gvegas with prediction and adaptation in Multi-hop Ad Hoc Networks. Wireless Netw 23(5):1535–1548. https://doi.org/10.1007/s11276-016-1242-y

Priyadarshi R, Rawat P, Nath V, Acharya B, Shylashree N (2020) Three Level Heterogeneous Clustering Protocol for Wireless Sensor Network. Microsyst Technol 26(12):3855–3864. https://doi.org/10.1007/s00542-020-04874-x

Jiang S, Song X, Wang H, Han JJ, Li QH (2006) A clustering-based method for unsupervised intrusion detections. Pattern Recognit Lett 27(7):802–810. https://doi.org/10.1016/j.patrec.2005.11.007

Priyadarshi R, Singh L, Kumar S, Sharma I (2018) A Hexagonal Network Division Approach for Reducing Energy Hole Issue in WSN. Eur J Pure Appl Math 118 (March)

Jin Y, Duffield N, Erman J, Haffner P, Sen S, and Zhi Li Zhang (2012) A modular machine Learning System for Flow-Level Traffic classification in large networks. ACM Trans Knowl Discovery Data 6(1). https://doi.org/10.1145/2133360.2133364

Karagiannis T, Papagiannaki K, Faloutsos M (2005) BLINC: Multilevel Traffic classification in the Dark. Comput Communication Rev 35(4):229–240. https://doi.org/10.1145/1090191.1080119

Karami A (2015) ACCPndn: adaptive congestion control protocol in named data networking by learning capacities using optimized time-lagged feedforward neural network. J Netw Comput Appl 56:1–18. https://doi.org/10.1016/j.jnca.2015.05.017

Priyadarshi R, Soni SK, and Prashant Sharma (2019) An enhanced GEAR Protocol for Wireless Sensor Networks. Lecture Notes Electr Eng 511:289–297 Springer Singapore. https://doi.org/10.1007/978-981-13-0776-8_27

Rao S (2006) Operational Fault detection in Cellular Wireless Base-stations. IEEE Trans Netw Serv Manage 3(2):1–11. https://doi.org/10.1109/TNSM.2006.4798311

Rawat P, Chauhan S, and Rahul Priyadarshi (2020) Energy-efficient clusterhead selection Scheme in Heterogeneous Wireless Sensor Network. J Circuits Syst Computers 29(13):2050204. https://doi.org/10.1142/S0218126620502047

Reddy EK (2017) Comparative Analysis of Clustering Techniques in Data Mining. Int J Adv Sci Technol Eng Manage Sci 9028(1):2454–2356. www.ijastems.org

Ross DA, Lim J, Lin RS, Ming HY (2008) Incremental learning for Robust Visual Tracking. Int J Comput Vision 77(1–3):125–141. https://doi.org/10.1007/s11263-007-0075-7

Sateesh V, Anugrahith A, Kumar R, Priyadarshi, Nath V (2021) A Novel Deployment Scheme to Enhance the Coverage in Wireless Sensor Network. In Lecture Notes in Electrical Engineering, edited by Vijay Nath and J K Mandal, 673:985–93. Singapore: Springer Singapore. https://doi.org/10.1007/978-981-15-5546-6_82

Shon T, and Jongsub Moon (2007) A Hybrid Machine Learning Approach to Network Anomaly Detection. Inf Sci 177(18):3799–3821. https://doi.org/10.1016/j.ins.2007.03.025

Singh L, Kumar A (2020) and Rahul Priyadarshi. Performance and Comparison Analysis of Image Processing Based Forest Fire Detection. In Lecture Notes in Electrical Engineering, edited by Vijay Nath and J K Mandal, 642:473–79. Singapore: Springer Singapore. https://doi.org/10.1007/978-981-15-2854-5_41

Sun J, Chan S, Zukerman M (2012) IAPI: An Intelligent adaptive PI active Queue Management Scheme. Comput Commun 35(18):2281–2293. https://doi.org/10.1016/j.comcom.2012.07.013

Priyadarshi R, and Raj Vikram (2023) A triangle-based localization Scheme in Wireless Multimedia Sensor Network. Wireless Pers Commun 133(1):525–546. https://doi.org/10.1007/s11277-023-10777-7

Tesauro G (2007) Reinforcement learning in Autonomic Computing: a Manifesto and Case studies. IEEE Internet Comput 11(1):22–30. https://doi.org/10.1109/MIC.2007.21

Tsai C, Fong YF, Hsu CY, Lin, Wei YL (2009) Intrusion detection by machine learning: a review. Expert Syst Appl 36(10):11994–11990. https://doi.org/10.1016/j.eswa.2009.05.029

Priyadarshi R, Yadav S (2019) and Deepika Bilyan. Performance Analysis of Adapted Selection Based Protocol over LEACH Protocol. In Smart Computational Strategies: Theoretical and Practical Aspects, edited by Ashish Kumar Luhach, Kamarul Bin Ghazali Hawari, Ioan Cosmin Mihai, Pao-Ann Hsiung, and Ravi Bhushan Mishra, 247–56. Singapore: Springer Singapore. https://doi.org/10.1007/978-981-13-6295-8_21

Wang M, Cui Y, Wang X, Shihan Xiao, and Junchen Jiang (2018) Machine learning for networking: Workflow, advances and opportunities. IEEE Network 32(2):92–99. https://doi.org/10.1109/MNET.2017.1700200

Priyadarshi R (2024) Energy-efficient routing in Wireless Sensor networks: a Meta-heuristic and Artificial Intelligence-Based Approach: a Comprehensive Review. Arch Comput Methods Eng. https://doi.org/10.1007/s11831-023-10039-6

Stigler SM (2007) Gauss and the invention of least squares. Annals Stat 9(3). https://doi.org/10.1214/aos/1176345451

Priyadarshi R (2024) Exploring machine learning solutions for overcoming challenges in IoT-Based Wireless Sensor Network Routing: a Comprehensive Review. Wireless Netw. https://doi.org/10.1007/s11276-024-03697-2

Thakkar Mansi K, Patel MM (2018) Energy Efficient Routing in Wireless Sensor Network. Proceedings of the International Conference on Inventive Research in Computing Applications, ICIRCA 2018 118(20):264–68. https://doi.org/10.1109/ICIRCA.2018.8597353

Priyadarshi R (2017) and Abhyuday Bhardwaj. Node Non-Uniformity for Energy Effectual Coordination in Wsn. International Journal on Information Technologies & Security, № 4(4):2017. https://ijits-bg.com/contents/IJITS-No4-2017/2017-N4-01.pdf

Wang Y, Martonosi M, and Li-Shiuan Peh (2007) Predicting Link Quality using supervised learning in Wireless Sensor Networks. ACM SIGMOBILE Mob Comput Commun Rev 11(3):71–83. https://doi.org/10.1145/1317425.1317434

Priyadarshi R, Bhardwaj P, Gupta P (2023) and Vijay Nath. Utilization of Smartphone-Based Wireless Sensors in Agricultural Science: A State of Art. In Lecture Notes in Electrical Engineering, edited by Vijay Nath and Jyotsna Kumar Mandal, 887:681–88. Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-19-1906-0_56

Xu K, Tian Y, and Nirwan Ansari (2004) TCP-Jersey for Wireless IP communications. IEEE J Sel Areas Commun 22(4):747–756. https://doi.org/10.1109/JSAC.2004.825989

Zhang C, Jiang J, and Mohamed Kamel (2005) Intrusion detection using hierarchical neural networks. Pattern Recognit Lett 26(6):779–791. https://doi.org/10.1016/j.patrec.2004.09.045

Priyadarshi R, Singh L, Randheer, Singh A (2018) A Novel HEED Protocol for Wireless Sensor Networks. In 2018 5th International Conference on Signal Processing and Integrated Networks, SPIN 2018, 296–300. https://doi.org/10.1109/SPIN.2018.8474286

Yi C, Afanasyev A, Moiseenko I, Wang L, Zhang B, Zhang L (2013) A case for Stateful Forwarding Plane. Comput Commun 36(7):779–791. https://doi.org/10.1016/j.comcom.2013.01.005

Priyadarshi R, Singh L, Singh A, Thakur A (2018) SEEN: Stable Energy Efficient Network for Wireless Sensor Network. In 2018 5th International Conference on Signal Processing and Integrated Networks, SPIN 2018, 338–42. https://doi.org/10.1109/SPIN.2018.8474228

Williams N, Zander S, Armitage G (2006) A Preliminary Performance Comparison of Five Machine Learning Algorithms for practical IP Traffic Flow classification. Comput Communication Rev 36(5):7–15. https://doi.org/10.1145/1163593.1163596

Priyadarshi R, Soni SK, Bhadu R, Nath V (2018) Performance Analysis of Diamond Search Algorithm over full search algorithm. Microsyst Technol 24(6):2529–2537. https://doi.org/10.1007/s00542-017-3625-0

Wang Z, Zhang M, Wang D, Song C, Liu M, Li J, Lou L, and Zhuo Liu (2017) Failure prediction using machine learning and Time Series in Optical Network. Opt Express 25(16):18553. https://doi.org/10.1364/oe.25.018553

Priyadarshi R, Soni SK, and Vijay Nath (2018) Energy efficient cluster head formation in Wireless Sensor Network. Microsyst Technol 24(12):4775–4784. https://doi.org/10.1007/s00542-018-3873-7

Zhang J, Chen C, Xiang Y, Wanlei Zhou, and Yong Xiang (2013) Internet traffic classification by aggregating correlated naive bayes predictions. IEEE Trans Inf Forensics Secur 8(1):5–15. https://doi.org/10.1109/TIFS.2012.2223675

Download references

Author information

Authors and affiliations.

College of Department of Linguistics and Translation, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong, 999077, China

Faculty of Engineering & Technology, ITER, Siksha ‘O’ Anusandhan University, Bhubaneswar, 751030, India

Rahul Priyadarshi

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Rahul Priyadarshi .

Ethics declarations

Conflict of interest.

The authors have no conflict of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Bian, K., Priyadarshi, R. Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues. Arch Computat Methods Eng (2024). https://doi.org/10.1007/s11831-024-10110-w

Download citation

Received : 25 December 2023

Accepted : 18 March 2024

Published : 29 March 2024

DOI : https://doi.org/10.1007/s11831-024-10110-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research
  • Machine Learning Tutorial
  • Data Analysis Tutorial
  • Python - Data visualization tutorial
  • Machine Learning Projects
  • Machine Learning Interview Questions
  • Machine Learning Mathematics
  • Deep Learning Tutorial
  • Deep Learning Project
  • Deep Learning Interview Questions
  • Computer Vision Tutorial
  • Computer Vision Projects
  • NLP Project
  • NLP Interview Questions
  • Statistics with Python
  • 100 Days of Machine Learning

7 Major Challenges Faced By Machine Learning Professionals

  • 7 Best Tools to Manage Machine Learning Projects
  • 10 Must Have Machine Learning Engineer Skills in 2024
  • Top 20 ChatGPT Prompts For Machine Learning
  • How Machine Learning Will Change the World?
  • 7 Skills Needed to Become a Machine Learning Engineer
  • ML(Machine Learning) vs ML(Meta Language)
  • General steps to follow in a Machine Learning Problem
  • 5 Machine Learning Projects to Implement as a Beginner
  • How Machine Learning Is Used by Famous Companies?
  • How Machine Learning is Used on Social Media Platforms in 2024?
  • Top Benefits of Machine Learning in FinTech
  • Meta-Learning in Machine Learning
  • Top 5 Machine Learning Trends For 2021
  • Top Machine Learning Trends in 2024
  • Top 7 Artificial Intelligence and Machine Learning Trends For 2022
  • How to Start Learning Machine Learning?
  • How Should a Machine Learning Beginner Get Started on Kaggle?
  • Top 10 Apps Using Machine Learning in 2020
  • Difference between Machine Learning and Predictive Modelling
  • Top 7 Machine Learning Hackathons That You Can Consider
  • Top Career Paths in Machine Learning
  • Top Machine Learning Projects for Healthcare
  • 10 Most Popular Machine Learning Tools in 2024
  • Machine learning in marketing
  • Machine Learning as a Service (MLaaS)
  • Best Data Science and Machine Learning Platforms in 2024
  • How to Become a Machine Learning Engineer?
  • What is the Role of Machine Learning in Data Science

In Machine Learning , there occurs a process of analyzing data for building or training models. It is just everywhere; from Amazon product recommendations to self-driven cars, it beholds great value throughout. As per the latest research, the global machine learning market is expected to grow by 43% by 2024. This revolution has enhanced the demand for machine learning professionals to a great extent. AI and machine learning jobs have observed a significant growth rate of 75% in the past four years, and the industry is growing continuously. A career in the Machine learning domain offers job satisfaction, excellent growth, insanely high salary, but it is a complex and challenging process. 

7-Major-Challenges-Faced-By-Machine-Learning-Professionals

There are a lot of challenges that machine learning professionals face to inculcate ML skills and create an application from scratch. What are these challenges? In this blog, we will discuss seven major challenges faced by machine learning professionals. Let’s have a look.

1. Poor Quality of Data

Data plays a significant role in the machine learning process. One of the significant issues that machine learning professionals face is the absence of good quality data. Unclean and noisy data can make the whole process extremely exhausting. We don’t want our algorithm to make inaccurate or faulty predictions. Hence the quality of data is essential to enhance the output. Therefore, we need to ensure that the process of data preprocessing which includes removing outliers, filtering missing values, and removing unwanted features, is done with the utmost level of perfection.  

2. Underfitting of Training Data

This process occurs when data is unable to establish an accurate relationship between input and output variables. It simply means trying to fit in undersized jeans. It signifies the data is too simple to establish a precise relationship. To overcome this issue:

  • Maximize the training time
  • Enhance the complexity of the model
  • Add more features to the data
  • Reduce regular parameters
  • Increasing the training time of model

3. Overfitting of Training Data

Overfitting refers to a machine learning model trained with a massive amount of data that negatively affect its performance. It is like trying to fit in Oversized jeans. Unfortunately, this is one of the significant issues faced by machine learning professionals. This means that the algorithm is trained with noisy and biased data, which will affect its overall performance. Let’s understand this with the help of an example. Let’s consider a model trained to differentiate between a cat, a rabbit, a dog, and a tiger. The training data contains 1000 cats, 1000 dogs, 1000 tigers, and 4000 Rabbits. Then there is a considerable probability that it will identify the cat as a rabbit. In this example, we had a vast amount of data, but it was biased; hence the prediction was negatively affected.  

We can tackle this issue by:

  • Analyzing the data with the utmost level of perfection
  • Use data augmentation technique
  • Remove outliers in the training set
  • Select a model with lesser features

To know more, you can visit here .

4. Machine Learning is a Complex Process

The machine learning industry is young and is continuously changing. Rapid hit and trial experiments are being carried on. The process is transforming, and hence there are high chances of error which makes the learning complex. It includes analyzing the data, removing data bias, training data, applying complex mathematical calculations, and a lot more. Hence it is a really complicated process which is another big challenge for Machine learning professionals.

5. Lack of Training Data

The most important task you need to do in the machine learning process is to train the data to achieve an accurate output. Less amount training data will produce inaccurate or too biased predictions. Let us understand this with the help of an example. Consider a machine learning algorithm similar to training a child. One day you decided to explain to a child how to distinguish between an apple and a watermelon. You will take an apple and a watermelon and show him the difference between both based on their color, shape, and taste. In this way, soon, he will attain perfection in differentiating between the two. But on the other hand, a machine-learning algorithm needs a lot of data to distinguish. For complex problems, it may even require millions of data to be trained. Therefore we need to ensure that Machine learning algorithms are trained with sufficient amounts of data.

6. Slow Implementation

This is one of the common issues faced by machine learning professionals. The machine learning models are highly efficient in providing accurate results, but it takes a tremendous amount of time. Slow programs, data overload, and excessive requirements usually take a lot of time to provide accurate results. Further, it requires constant monitoring and maintenance to deliver the best output.

7. Imperfections in the Algorithm When Data Grows

So you have found quality data, trained it amazingly, and the predictions are really concise and accurate. Yay, you have learned how to create a machine learning algorithm!!  But wait, there is a twist; the model may become useless in the future as data grows. The best model of the present may become inaccurate in the coming Future and require further rearrangement. So you need regular monitoring and maintenance to keep the algorithm working. This is one of the most exhausting issues faced by machine learning professionals.

Conclusion: Machine learning is all set to bring a big bang transformation in technology. It is one of the most rapidly growing technologies used in medical diagnosis, speech recognition, robotic training, product recommendations, video surveillance, and this list goes on. This continuously evolving domain offers immense job satisfaction, excellent opportunities, global exposure, and exorbitant salary. It is a high risk and a high return technology. Before starting your machine learning journey, ensure that you carefully examine the challenges mentioned above. To learn this fantastic technology, you need to plan carefully, stay patient, and maximize your efforts. Once you win this battle, you can conquer the Future of work and land your dream job!

Please Login to comment...

Similar reads.

author

  • Machine Learning

advertisewithusBannerImg

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

current research problems in machine learning

Science has an AI problem: Research group says they can fix it

A I holds the potential to help doctors find early markers of disease and policymakers to avoid decisions that lead to war. But a growing body of evidence has revealed deep flaws in how machine learning is used in science, a problem that has swept through dozens of fields and implicated thousands of erroneous papers.

Now an interdisciplinary team of 19 researchers, led by Princeton University computer scientists Arvind Narayanan and Sayash Kapoor, has published guidelines for the responsible use of machine learning in science.

"When we graduate from traditional statistical methods to machine learning methods, there are a vastly greater number of ways to shoot oneself in the foot," said Narayanan, director of Princeton's Center for Information Technology Policy and a professor of computer science.

"If we don't have an intervention to improve our scientific standards and reporting standards when it comes to machine learning-based science, we risk not just one discipline but many different scientific disciplines rediscovering these crises one after another."

The authors say their work is an effort to stamp out this smoldering crisis of credibility that threatens to engulf nearly every corner of the research enterprise. A paper detailing their guidelines appears May 1 in the journal Science Advances .

Because machine learning has been adopted across virtually every scientific discipline, with no universal standards safeguarding the integrity of those methods, Narayanan said the current crisis, which he calls the reproducibility crisis, could become far more serious than the replication crisis that emerged in social psychology more than a decade ago.

The good news is that a simple set of best practices can help resolve this newer crisis before it gets out of hand, according to the authors, who come from computer science, mathematics, social science and health research.

"This is a systematic problem with systematic solutions," said Kapoor, a graduate student who works with Narayanan and who organized the effort to produce the new consensus-based checklist.

The checklist focuses on ensuring the integrity of research that uses machine learning. Science depends on the ability to independently reproduce results and validate claims. Otherwise, new work cannot be reliably built atop old work, and the entire enterprise collapses.

While other researchers have developed checklists that apply to discipline-specific problems, notably in medicine, the new guidelines start with the underlying methods and apply them to any quantitative discipline.

One of the main takeaways is transparency. The checklist calls on researchers to provide detailed descriptions of each machine learning model, including the code, the data used to train and test the model, the hardware specifications used to produce the results, the experimental design, the project's goals and any limitations of the study's findings.

The standards are flexible enough to accommodate a wide range of nuance, including private datasets and complex hardware configurations, according to the authors.

While the increased rigor of these new standards might slow the publication of any given study, the authors believe wide adoption of these standards would increase the overall rate of discovery and innovation, potentially by a lot.

"What we ultimately care about is the pace of scientific progress," said sociologist Emily Cantrell, one of the lead authors, who is pursuing her Ph.D. at Princeton.

"By making sure the papers that get published are of high quality and that they're a solid base for future papers to build on, that potentially then speeds up the pace of scientific progress. Focusing on scientific progress itself and not just getting papers out the door is really where our emphasis should be."

Kapoor concurred. The errors hurt. "At the collective level, it's just a major time sink," he said. That time costs money. And that money, once wasted, could have catastrophic downstream effects, limiting the kinds of science that attract funding and investment, tanking ventures that are inadvertently built on faulty science, and discouraging countless numbers of young researchers.

In working toward a consensus about what should be included in the guidelines, the authors said they aimed to strike a balance: simple enough to be widely adopted, comprehensive enough to catch as many common mistakes as possible.

They say researchers could adopt the standards to improve their own work; peer reviewers could use the checklist to assess papers; and journals could adopt the standards as a requirement for publication.

"The scientific literature, especially in applied machine learning research, is full of avoidable errors," Narayanan said. "And we want to help people. We want to keep honest people honest."

More information: Sayash Kapoor et al, REFORMS: Consensus-based Recommendations for Machine-learning-based Science, Science Advances (2024). DOI: 10.1126/sciadv.adk3452 . www.science.org/doi/10.1126/sciadv.adk3452

Provided by Princeton University

Credit: Pixabay/CC0 Public Domain

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Open access
  • Published: 01 April 2022

Current progress and open challenges for applying deep learning across the biosciences

  • Nicolae Sapoval   ORCID: orcid.org/0000-0002-0736-5075 1   na1 ,
  • Amirali Aghazadeh 2   na1 ,
  • Michael G. Nute 1 ,
  • Dinler A. Antunes   ORCID: orcid.org/0000-0001-7947-6455 3 ,
  • Advait Balaji 1 ,
  • Richard Baraniuk 4 ,
  • C. J. Barberan 4 ,
  • Ruth Dannenfelser 1 ,
  • Chen Dun 1 ,
  • Mohammadamin Edrisi   ORCID: orcid.org/0000-0002-9738-1916 1 ,
  • R. A. Leo Elworth 1 ,
  • Bryce Kille 1 ,
  • Anastasios Kyrillidis 1 ,
  • Luay Nakhleh   ORCID: orcid.org/0000-0003-3288-6769 1 ,
  • Cameron R. Wolfe 1 ,
  • Zhi Yan   ORCID: orcid.org/0000-0003-2433-5553 1 ,
  • Vicky Yao   ORCID: orcid.org/0000-0002-3201-9983 1 &
  • Todd J. Treangen   ORCID: orcid.org/0000-0002-3760-564X 1 , 5  

Nature Communications volume  13 , Article number:  1728 ( 2022 ) Cite this article

40k Accesses

85 Citations

207 Altmetric

Metrics details

  • Computational biology and bioinformatics
  • Computer science
  • Machine learning

Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.

Similar content being viewed by others

current research problems in machine learning

Using deep learning to annotate the protein universe

current research problems in machine learning

Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints

current research problems in machine learning

Highly accurate protein structure prediction with AlphaFold

Introduction.

The recent success of AlphaFold2 1 in predicting the 3D structure of proteins from their sequences highlights one of the most effective applications of deep learning in computational biology to date. Deep learning (DL) allows for finding a representation of the data with multiple layers of abstraction using complex models that are composed of several layers of nonlinear computational units (Fig.  1 ). Observed through the success of DL in a broad variety of application domains, the efficacy of using DL depends on the development of specialized neural network architectures that can capture important properties of the data such as spatial locality (convolutional neural networks – CNNs), sequential nature (recurrent neural networks – RNNs), context dependence (Transformers), and data distribution (autoencoders – AEs). Figure  1 illustrates six DL architectures that have found the most applications within the realm of computational biology. We refer the reader to LeCun et al.  2 for a complete review of DL methods and architectures and keep the focus of the paper on computational biology applications. These DL models have revolutionized speech recognition, visual object recognition, and object detection and have lately played a key role in solving important problems in computational biology. The applications of DL in other areas of computational biology, such as functional biology, are only growing while other areas, such as phylogenetics, are in their infancy. Given the wide divide between the receptiveness of DL in different areas in computational biology, some key questions remain unanswered: (1) What makes an area prime for DL methods? (2) What are the potential limitations of DL for computational biology applications? (3) Which DL model is most appropriate for a specific application area in computational biology?

figure 1

Top panel encapsulates the three most common paradigms of machine learning: supervised learning in which dataset contains ground truth labels, unsupervised learning in which dataset does not contain ground truth labels, and reinforcement learning in which an algorithmic agent interacts with a real or simulated environment. The bottom panels provide an overview of the most prevalent DL architecture ideas each designed to achieve specific highlighted goals. An additional set of short descriptions is provided for other common components of DL architectures mentioned in the manuscript.

In this paper, we aim to address these foundational questions from the lens of computational biology. The answers, however, are highly task specific and can only be addressed in the context of the corresponding applications. The pitfalls of applying machine learning (ML) in genomics have been discussed in Whalen et al. 3 , but our goal is to provide a perspective on the impact of DL across five distinct areas. While there are multiple areas of interest in the biosciences where DL has achieved notable successes (e.g. DeepVariant 4 , DeepArg 5 , metagenomic binning 6 , and lab-of-origin attribution 7 ), we aim to only focus on a few diverse and broad subtopics. In those areas we evaluate the improvements that DL has had over classical ML techniques in computational biology with varying levels of success to date (Fig.  2 ). For each area, we explore limitations of current approaches and opportunities for improvement, and include practical tips. We anchor our discussions around five broad, distinct areas in computational biology: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference (Table  1 ). These areas provide a range of impact levels from major paradigm shifts (AlphaFold2) to DL applications in their infancy (phylogenetic inference); and collectively, they provide rich enough technical diversity to address the questions raised in this perspective. Over the next several subsections, we will review progress in each of the four computational biology topics, ordered from (i) paradigm shifting (where DL clearly outperforms other ML and classical approaches, and provides a field-wide impact), (ii) major success (where DL performance is typically higher than of that of other  ML and classical approaches), (iii) moderate success (where DL performance is typically comparable to other ML and classical approaches) to (iv) minor successes (where DL methods are not widely adopted or underperform compared to other ML and classical approaches), and then discuss common challenges for DL in biosciences (Table  2) .

figure 2

For each of the areas considered in this manuscript, it summarizes estimated sizes of key datasets and databases, as well as the projected growth rate of these. Additionally the rightmost column summarizes the most popular DL architectures applied to the corresponding areas in biosciences.

Paradigm shifting successes of DL

Protein structure prediction.

We start our discussion with protein structure prediction which is arguably one of the most successful applications of DL in computational biology; this success is what we refer to as a paradigm shift. It is largely known that the protein’s amino acid sequence determines its 3D structure, which is in turn directly related to its function (e.g., chemical reaction catalysis, signal transduction, scaffold, etc.) 8 , 9 . The history of protein structure prediction problem goes back to the determination of the 3D structure of myoglobin by John Kendrew in the 1950s which was a landmark in biochemistry and structural biology 10 . Since then, X-ray crystallography has become the gold-standard experimental method for protein structure determination 11 , 12 , as well as the reference to validate computational models for protein structure prediction. Considering the high cost and technical limitations of X-ray crystallography, and the growing access to biological sequences following the Human Genome Project, predicting the 3D structure of a protein from its sequence became the Mount Everest in computational biology 8 ; a challenge broadly known as the “protein folding problem”. Initial efforts concentrated on the use of biophysically-accurate energy functions and knowledge-based statistical reasoning, but faster progress was recently achieved with a greater focus on DL.

One of the key reasons for the recent success of DL in this area has been the wealth of unsupervised data in the form of multiple sequence alignment (MSA) 1 , 9 , 13 , 14 , 15 , 16 , 17 , which has enabled learning a nonlinear evolution-informed representation of proteins. Progress in the field has been accelerated by the creation of a bi-annual international competition, called the Critical Assessment of Protein Structure Prediction (CASP). Launched in 1994, CASP created the means to objectively test available methods through blind predictions, providing competing groups with a set of challenges (i.e., sequences of proteins with unknown structures), and evaluating their performances against the respective experimentally-determined structures. In their first participation in CASP13, AlphaFold, implemented by DeepMind group at Google, made the news by clearly outperforming the second best method 14 , and nearly twice beyond the projection based on previous editions 18 . Following recent trends in the field 13 , 16 , 19 , 20 , AlphaFold and AlphaFold2 leverage the combined use of DL and MSA 18 , 21 . This proved to be a winning strategy which was able to overcome the lack of large training datasets on protein structure. The Protein Data Bank (PDB) 22 is the reference database for experimentally-determined macromolecular structures, and currently hosts close to 180,000 entries. This is a small number of data points for a complex mapping involved in the problem, and these are further biased by technical constraints of the experimental methods. Protein sequence data, on the other hand, is available on a much larger scale. Therefore, MSA allows modeling methods to extract pairwise evolutionary correlations from this larger corpus of data, maximizing the learning on available structural data. Other key factors for the success of DL in this area include innovation in model design such as new attention strategies tuned towards invariances and symmetries in proteins, graph-based representations, and model recycling strategies.

The impact of AlphaFold2 on the field of structural biology is undeniable; it successfully demonstrated the use of a DL-based implementation for high accuracy protein structure prediction 21 . This achievement is already driving and accelerating further developments in the field, as highlighted by the remarkable number of early citations. In addition, DeepMind has partnered with the European Molecular Biology Laboratory (EMBL) 23 to create an open-access database of protein structures modeled with AlphaFold2 17 . The database already covers 98.5% of human proteins, for which at least 36% of the amino acid residues were predicted with high confidence. Finally, rather than retiring experimental methods, DL-based methods might augment the accuracy and reach of experimental methods as demonstrated by preliminary applications to solving challenging structures with data from X-ray crystallography and cryo-EM 1 , 15 . However, many caveats, limitations and open questions 8 , 9 remain. In particular, while AlphaFold2 successfully predicts the static structure of a protein, many key insights about protein’s biological function come from its dynamic conformations. Furthermore, dynamics of interaction of multiple proteins still present open challenges in the field. Moving forward, it will be important to monitor the application of DL to these follow up research areas.

Major successes of DL

Protein function prediction.

Predicting protein function is a natural next step after protein structure prediction. Protein function prediction involves mapping target proteins to curated ontologies, such as Gene Ontology (GO) terms, Biological Processes (BP), Molecular Functions (MF) and Cellular components (CC). Protein structure can convey a lot of information about these ontologies, however, there is no direct mapping between the two and the mapping is often very complex 24 . Despite the tremendous growth of protein sequences available in the UniProtKB database, functional annotations for the vast majority of proteins still remain partly or completely unknown 25 . Limited and imbalanced training examples, a large output space of possible functions and the hierarchical nature of the GO labels are some of the main bottlenecks associated with functional annotation of proteins 26 . To overcome some of the issues recent methods have leveraged features from different sources including sequence 27 , structure 22 , interaction networks 28 , scientific literature, homologies, domain information 29 and even incorporate one or a combination of DL architectures to handle different stages of prediction task (e.g. feature representation, feature selection, and classification).

One of the most successful DL approaches to the problem, DeepGO 30 incorporated CNN to learn sequence-level embeddings and combines it with knowledge graph embeddings for each protein obtained 31 from Protein-Protein Interaction (PPI) networks. DeepGO was one of the first DL based models to perform better than BLAST 32 and previous methods on functional annotation tasks over the three GO categories 30 . An improved version of the tool, DeepGOPlus 33 emerged as one of the top performers when compared to other tools in the CAFA3 challenge across the three GO categories 33 . DeepGOPlus used convolutional filters of different sizes with individual max-pooling to learn dense feature representations of protein sequences embedded in a one-hot encoding scheme. The authors showed that combining the outputs from CNN with homology-based predictions from DIAMOND 34 can result in better predictive accuracy.

Unsupervised methods such as DAEs also have been instrumental by learning dense, robust, and low-dimensional representations of proteins. Chicco et al. 35 developed a DAE to represent proteins for assigning missing GO annotations and showed 6% to 36% improvements compared to non-DL methods over six different GO datasets. Miranda and Hu 36 introduced the Stacked Denoising Autoencoders (sdAE) to learn more robust representation of proteins. Gilgorijevic et al. introduced deepNF 37 that uses Multimodal DAE (MDA) to extract features from multiple heterogeneous interaction networks which outperform methods based on matrix factorization and linear regression 37 . Methods for learning low-dimensional embeddings of proteins continue to grow.

Beyond just predicting Gene Ontology labels, studies have also focused on several other task-specific functional categories such as identifying specific enzyme functions 38 and potential post-translational modification sites 39 . These studies are a fundamental step towards developing novel proteins with specialized functions or modifying the efficacy of existing proteins as seen in the recent advances of DL in enzyme engineering 40 . Going forward, applications of deep learning in engineering proteins tailored to specific functions can help increase throughput of candidate proteins for pharmaceutical applications among other domains.

Besides these canonical architectures, there have been other approaches that have used a combination of the above methods for function classification 41 . Overall, previous results indicate that models integrating features from multi-modal data types (e.g., sequence, structure, PPI, etc) are more likely to outperform the ones that rely on a single datatype. Trends from literature indicate that relying on task-specific architectures could help greatly enhance the feature representation from respective data types. Future work in this direction could focus on combining DAEs and RNNs for sequence based representation, and Graph Convolutional Networks (GCNs) for structure based as well as PPI based information. Combining these representations in a hierarchical classifier such as the multi-task DNN with biologically-relevant regularization methods 42 , 43 could allow for an explainable and computationally feasible DL architecture for protein function prediction.

Genome engineering

Biomedical engineering, and in particular genome engineering, is an important area in biology where DL models have been increasingly employed. Among genome engineering technologies, clustered regularly interspaced short palindromic repeats (CRISPR), i.e., a family of DNA sequences found in the genomes of prokaryotic organisms, have been recently used as a guide to recognize and cleave specific locations on the human genome. In the CRISPR-associated protein 9 (Cas9) technology, a single-guide RNA (gRNA) steers the protein to a specific genomic target. When the 20-nucleotide gRNA sequence complements the genome, Cas9 creates a double-strand break (DSB) on the targets (an on-target event). Due to the ability to precisely target specific locations on the genome, we have observed enormous advancements in CRISPR-based editing technologies since the development of Cas9. However, recent studies have shown that multiple mismatches between the gRNA and the genomic targets are tolerated and, as a result, Cas9 can cut unwanted locations on the genome (an off-target event). Off-target edits have pathogenic effects on the functionality and integrity of the cell. Therefore, the full clinical deployment of Cas9 has been slow due to the insufficient efficiency, reliability, and controllability challenges for therapeutic purposes. As a result, reducing off-target while improving the on-target efficiency has been an important ultimate goal in genome engineering target by DL techniques.

The sheer complexity of the biological process involved in modeling the DNA repair process and the growing availability of labeled data caused by a rapid drop in the cost of CRISPR assays, have made DL-based methods particularly successful choices to find the root cause of these inefficiencies. The use of DL models was triggered by the observation that the on-target and off-target events and the DNA repair outcome 44 are predictable by the sequence around the DSB, its location on the genome, and the potential mistargeted sequences on the genome. Several computational tools have been successfully developed to design gRNAs with maximum on-target activity and minimum off-target effects 45 . DeepCas9 is among CNN-based models which learns functional gRNAs directly from their canonical sequence representation 46 , 47 . The success of DeepCRISPR, on the other hand, relies on extracting about half a billion unlabeled gRNA sequences from the human coding and non-coding regions and learning a low-dimensional representation of the gRNA 48 . DeepCRISPR also uses a data augmentation method to create less than a million sgRNAs with known knockout efficiencies to train a larger CNN model. CnnCrispr uses a language processing model to learn the representation of gRNA and then employs a combination of bidirectional LSTM and CNN 49 while RNNs have been the reason for the success of other models 50 . Attention mechanism has also been shown to improve the accuracy in predicting on and off target effects 51 , 52 . ADAPT 53 is another recent CNN-based method for fully-automated CRISPR design for vertebrate-infecting viral diagnostics which owes its success to the construction of a massive training CRISPR dataset. Recent methods for predicting the DNA repair outcome employ other strategies. SPROUT compensates the lack of labels on harder-to-collect human CD4+ T cells by predicting a summary statistics of the DNA repair outcome 54 . FORECasT employs a larger dataset from easier-to-collect human chronic myelogenous leukemia cell-line (K562) 55 . InDelphi creates hand-designed features of the input sequence including the length and GC content of the homologous sequences around the cut site 56 while CROTON avoids feature engineering and instead performs neural architecture search 57 . All these strategies help reducing the number of labeled data points required to learn the input-output mapping.

The future of DL is geared towards new editing technologies such as CRISPR-Cas12a (cpf1) 58 , base editors 59 , and prime editors 60 . While these methods do not introduce DSBs, their efficiency is still improving 61 ; in fact, DL has already shown promise in predicting the efficiency of Adenine base editors (ABEs) and Cytosine base editors (CBEs) 59 as well as prime editor 2 (PE2) activities in human cells 60 . The future challenges, however, are in understanding these models. CRISPRLand is a recent framework which takes the first step towards interpretation and visualization of DL models in terms of higher-order interactions 62 . Besides explainablity, we speculate that methods that enable an uncertainty estimate of the prediction outcome become more prevalent in genome editing. Further, due to the significant cell-type effects on the efficiency of the CRISPR experiments, it is critical to be aware of the distribution shifts in deploying DL models in genome engineering. The integration of domain adaptation 63 methods to limit the effect of such distribution shifts are among other important future directions.

Moderate successes of DL

Systems biology and data integration.

Systems biology takes a holistic view of modeling complex biological processes to ultimately unravel the link between genotype and phenotype. Integration of diverse -omics data is central in bridging this gap, enabling robust predictive models that have led to several recent breakthroughs, spanning from basic biology 64 to precision medicine. These data are now more accessible than ever, due to improvements in sequencing technologies and the establishment of open access public repositories where researchers can deposit their own studies, such as SRA 65 , GEO 65 , ArrayExpress 66 , and PRIDE 67 ; and large coordinated efforts with structured multi-omic datasets: TCGA 68 , CCLE 69 , GTEx 70 , and ENCODE 71 . Given recent successes and the prevalence of both single and co-assay data, the field is now focused on integrating different data types (e.g., genomics, transcriptomics, epigenomics, proteomics, metabolomics) on single individuals, across many individuals, within and between phenotypic groups, and across different organisms. Data integration tasks fall into two main categories: 1) integration across different platforms and studies of a single data type, at times with other non-omics data (e.g., protein-protein interactions, pathway annotations, motif presence) and 2) integration between different -omic data types (e.g., RNA-seq, ChIP-seq, ATAC-seq, BS-seq). Much progress has been made on integration within a single data type, especially transcriptomics data, with classical ML and statistical approaches developed for batch correction 72 , 73 , 74 , 75 , modeling global gene co-expression patterns 76 , Bayesian integration strategies for function prediction 77 , 78 , and phenotype classification 79 . More recently, the increasing prevalence of single-cell transcriptomics has given rise to a new host of classic ML 80 , 81 , 82 and DL 83 , 84 approaches for data integration across experiments. DL methods in this space have arisen out of the need for methods that scale well with the large number of cells and ability to model non-linear patterns of cell similarity 83 , 85 . Here, we have only skimmed the surface of methods being developed for expression data, but this trend is emerging for other -omics data types, similarly driven by the resolution of improved high-resolution experimental assays 86 , 87 . Broadly, data integration analyses that simultaneously combine data types together, either from different studies or different types, typically fall into one of three categories, given the stage at which the integration is performed 88 : concatenation-based, transformation-based, or model-based. While data integration across studies can be data of the same type, here we focus on methods that specifically integrate across different -omics types, as these questions introduce additional technical challenges and complexity.

Concatenation-based integration methods perform data integration early in the method pipeline by combining data, in raw or processed forms, before any joint modeling and analysis. Traditional ML concatenation-based methods are often unsupervised and typically use automatic feature extraction techniques such as lasso 89 , joint clustering schemes 90 , and dimensionality reduction 91 to find relevant signal. These methods are usually applied to well-curated, multi-omic datasets from large consortia (e.g., TCGA), and thus most often have been used to find meaningful patient subgroups characterized by distinct patterns across data modalities. More recently, autoencoders have been used as an initial data processing step to generate lower dimensional embeddings that are then concatenated together as features for downstream models 92 , 93 . These approaches have improved performance over existing methods likely due to the advantages autoencoders have in denoising tasks, as well as their abilities to model nonlinear latent structure, even without sample labels.

Instead of directly concatenating separate latent embeddings, some groups have pursued transformation-based integration methods by modeling data jointly by mapping to a common representation (e.g., graph or kernel matrix). Historically, classic transformation-based ML methods use known anchor references 94 , kernel 95 , or manifold methods 96 to align multi-omics data. This is a rapidly growing area in data integration, especially for DL methods. Building off of the use of anchors from classical ML methods, new state-of-the-art methods frequently train single modality autoencoders, followed by an alignment procedure across modalities 97 . This direction is exciting, because once trained, the models can be used to predict an unobserved modality given a single data type. Additional exciting developments harness the power of these embedding representations together with other DL methods, including CNNs and RNNs for wide ranging predictive tasks, including cell fate 98 , drug response 99 , survival 92 , 100 , and clinical disease features 101 .

Perhaps the most straightforward way to integrate multi-modal data is to train individual data modality models, then integrate them by combining the results from the individual models, termed model-based integration. To some degree, this is similar to ensemble approaches frequently used in classical ML. Methods in this space can take wide-ranging approaches, including building data modality-specific networks before fusing them using message-passing theory 102 or combining different data representations using a discriminative learning approach 103 . DL methods have yet to gain much momentum for model-based integration, likely because the very nature of most DL methods blurs the line between the transformation-based and model-based paradigms. Classical approaches here try to bridge data modalities by finding a common modeling space, while DL naturally can identify common representations and model them jointly, thus circumventing the need for separate modeling and integration steps. While it is clear that deep neural networks will likely lead to better performance in data integration tasks, it is also important to keep in mind the limitations of DL, as well as important areas for continued research. Specifically, it is known that DL has the tendency to overfit to data. On the other hand, in data integration tasks, batch effects can be prevalent and it is often easy to have “contamination” between the training and test sets, all of which can lead to inflated performance estimates. Thus, it is important to carefully set up truly independent evaluation sets and identify appropriate performance baselines 3 . Furthermore, while genome-wide and whole transcriptomics datasets have broad coverage across the genome and transcriptome, human data (and in some cases, model organism data) is often skewed towards a disproportional amount of sick individuals 104 , is sex-biased towards men 105 , and biased by race with an over-represented population of Europeans 106 . These biases can result in spurious associations that plague all ML methods, but may be particularly difficult to identify when using DL.

Minor successes of DL

Phylogenetics.

A phylogeny is an evolutionary tree that models the evolutionary history of a set of taxa. The phylogeny inference problem concerns building a phylogeny from data—often molecular sequences—obtained from the set of taxa under investigation 107 . Figure  3 illustrates the phylogeny inference problem on four taxa; in this case it can be viewed as a classification problem among three possible topologies. However, classification methods have a major limitation in that they cannot infer branch lengths, nor do they scale beyond a very small number of taxa because the number of possible topologies (classes) grows super-exponentially with this variable. But perhaps more importantly, classifiers like DL models require training data, and benchmark data where the true phylogeny is known is almost impossible to obtain in this field. Instead, simulations have been the method of choice for generating training data, but this is a major dependency and methods are known to have divergent performance on simulated and biological data 108 . For complex versions of the phylogeny inference problem, more realistic simulation protocols are needed. Finally, phylogenetic inference on a single gene is in one sense a simplified problem itself: inferring a single phylogeny from genome-wide data introduces the complication that different genes can have different histories, or the true phylogeny might be a network 109 , rather than a tree. For these reasons DL has either had limited success or been restricted to small sub-problems aside from the main inference task.

figure 3

The input consists of sequences (DNA sequences in this illustration) obtained from the taxa of interest. Here, the taxa are A, B , C , and D . In standard approaches, such as maximum likelihood and maximum parsimony, a generative model in the form of a tree whose leaves are labeled by the four taxa is inferred. In the recently introduced DL approach to phylogenetic inference, the problem is viewed as a classification task where the network outputs correspond to the three possible tree topologies whose leaves are labeled by the taxa A , B , C , and D .

Nonetheless, there have been attempts to use DL for the classification task as described above. The Self-Organizing Tree (SOTA) algorithm 110 is a two-decades old unsupervised hierarchical clustering method based on a neural network to classify sequences and reconstruct phylogenetic trees from sequence data. SOTA follows the SOM (Self-Organizing Map) algorithm in growing cell structures from top to bottom dynamically until a desired (user-provided) taxonomic level is reached. Recently CNNs have been used to infer the unrooted phylogenetic tree on four taxa (called a quartet) 111 , 112 . Authors used simulated data for training a classifier which assigns sequences to their phylogenetic tree (Fig.  3) . But an analysis of the performance of the method of Zou et al. 112 by Zaharias et al. 113 shows that CNNs were not as accurate as other standard tree estimation methods, e.g., maximum likelihood, maximum parsimony, and neighbor joining, neither in terms of quartet estimation nor in terms of full tree estimation, especially when the sequence length was relatively short and/or rates of evolution were not sufficiently low. A potential workaround is to approach phylogeny inference as a graph generation problem, a more complex learning task.

Distance-based methods are another class of commonly used techniques for phylogenetic inference among which the neighbor joining method is the most common one, and DL has been applied to improve the distance representation. Jiang et al. 114 addressed the phylogenetic placement problem, i.e., the problem of adding a new taxon to a given tree without having to rebuild the tree from scratch, by training a CNN using a simulated backbone tree and sequences. Given the backbone tree with its associated and query sequences, the model outputs an embedding of the query and reference species which can be used as input to some distance-based phylogenetic placement tools, which then places the query sequences onto the reference tree. Bhattacharjee et al. 115 addressed the data imputation problem in the incomplete distance matrix using autoencoders. However, the key limitation of these methods is that trees cannot be reliably embedded into a Euclidean space of low dimensions 116 . Hyperbolic space, on the other hand, has been demonstrated to be more suitable for representing data with hierarchical latent structure 117 .

Other applications have used DL to aid in a more traditional inference pipeline. For example, the particular likelihood model to use for a maximum-likelihood search is often taken for granted as user decision, but a recent method used DL to optimize this decision 118 . In another case, DL was used to aid decision-making in the tree-search algorithm used in a traditional maximum likelihood heuristic. Finally, a very recent application uses a sparse learning model for something almost like the reverse process: given a phylogeny, it identifies the portions of a genome that most directly explain or relate to that model 119 . This can be used to validate phylogenetic inference as well as guide downstream analyses such as hypothesis generation and testing.

A traditional problem is the inference of perfect phylogeny where every site in the sequences mutates at most once along the branches of the tree. The problem of determining whether a perfect phylogeny exists and inferring it, if one exists, from binary data that is assumed to be correct is polynomially solvable. However, if the data is assumed to have errors, one approach to inferring a perfect phylogeny is by solving the minimum-flip problem: given a binary matrix of mutations - where each entry represents the presence (state 1) or absence (state 0) of mutation in a sample and a site - that does not admit a perfect phylogeny, the minimum number of “state flips" (from 0 to 1 or 1 to 0) to the data is sought so that a perfect phylogeny is admitted. Sadeqi Azer et al. 120 used an existing DL framework originally designed for solving the traveling salesman problem to tackle this problem 121 . Here, the input consists of the inferred single-nucleotide variations (SNVs) in single cells across different sites. The output is a matrix that admits a perfect phylogeny with the minimum number of state flips from the input matrix. The input matrix is flattened and passed through convolutional layers for encoding. The encoded data is fed to a Long Short Term Memory (LSTM) layer as a decoder. Then, an attention layer takes the outputs of the LSTM layer to score the entries of the mutation matrix according to the impact that flipping them might have on minimizing the overall number of state flips. This architecture results in a probability distribution on the entries of the input matrix that is used for flipping them. The model is trained using simulated data where the matrix and the number of flips to perform are provided. The key limitation of this approach is that there is no guarantee that the output admits a perfect phylogeny because the cost function might not be fully optimized.

Taken altogether, these related successes are impressive, but given the challenges outlined above it is difficult to conceive of an end-to-end DL model to directly estimate phylogenetic trees from raw data in the near future. And if one were to be developed, given its reliance on (likely simulated) training data, its applicability to actual biological sequences will need to be carefully validated before traditional phylogenetic methods are displaced.

General challenges for DL in the biosciences

Not all applications of DL have been equally successful in computational biology. While in some areas such as protein structure prediction and genome editing DL has found major success, in other areas like phylogenetic inference, DL has faced major hurdles (Table  1) . Most common issues faced by DL approaches stem from the lack of annotated data, inherent absence of the ground truth for non-simulated datasets, severe discrepancies between training data distribution and real-world test (e.g., clinical) data distribution, potential difficulties in result benchmarking and interpretation, and finally overcoming the biases and ethical issues in datasets and models. Additionally, with the growth of the data and DL models, training efficiency has become a major bottleneck for progress.

Specifically, the success of DL in different subareas in computational biology highly relies on the availability and diversity of standardized supervised and unsupervised datasets, ML benchmarks with clear biological impact, the computational nature of the problem, and the software engineering infrastructure to train the DL models. The remaining challenges of DL in computational biology are tied with improving model explainability, extracting actionable and human-understandable insights, boosting the efficiency and limiting the training costs, and finally mitigating the growing ethical issues of DL models; innovative solutions are emerging in DL and computational biology communities (Table  2 ). We will now review two key areas for improvement: (i) Explainability and (ii) Training efficiency.

Explainability

Perhaps one of the most critical limitations of DL models today, especially for biological and clinical applications, is that they are not as explainable as the simpler regression models in statistics; it is challenging to explain what each node of the network represents and how important it is to model performance. The highly nonlinear decision boundaries of DNNs and their overparameterized nature, which enable them to achieve high prediction accuracy, make them hard to explain as well. This lack of explanability becomes an important issue in computational biology, because trustworthiness of DNNs is arguably one of the most pressing problems in biological and sensitive clinical decision making applications. In fact, in biology often the question of why a model can predict well is as important as how accurately it can predict a phenomenon. For example in protein structure/function prediction we would like to know what rules in a predictive model govern the 3D geometry of a protein and its properties; in genome editing we aim to understand the biological DNA repair processes inferred from CRISPR models; in systems biology we aim to know the specific molecular differences that give rise to different phenotypes; in phylogenetics we aim to know the features that enable us to infer a phylogenetic tree. Addressing these questions are key in producing biological knowledge and creating actionable decisions in the clinical settings.

There have been some efforts in the ML community to develop methods to explain “black-box” DL models in the past few years 122 . Earlier works were developed in computer vision and biomedical applications, some of which have been applied to problems in computational biology as well. Activation maximization is a large class of algorithms which searches for an input which maximizes the model response typically by using gradient descent 123 , 124 ; the idea is to generate an input that best symbolizes an outcome. To make them human-interpretable, the input gets regularized using closed-form density functions of the data or GANs that mimic the data distribution. Methods that address the explainability question use more direct ways to gain insights from the NN function using their Taylor expansion 125 or Fourier transform 42 , 62 . The explanation takes the form of a heatmap which shows the importance of each input feature. Sensitivity analysis is another popular method of this sort which finds the input features to which the output is most sensitive to using backpropagation 126 ; this has been used for classification and diagnostic prediction of cancers using DNNs and gene expression profiling as well 127 . LIME 128 is a popular sensitivity analysis method which learns an interpretable model locally around the prediction. Simonyan et al. 124 proposed using the gradient of the output with respect to pixels of an input image to compute a saliency map of the image. To avoid the saturation effect in perturbation-based and gradient-based approaches, DeepLIFT 129 decomposes the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. SHAP 130 unifies these approaches using a theoretically grounded method which assigns each feature an importance value for a particular prediction. Finally, GNNExplainer 131 is a new approach among a family of methods which provide interpretable explanations for predictions of GNN-based models on graph-based DL tasks. Given an instance, GNNExplainer identifies a compact subgraph structure and a small subset of node features that have a crucial role in GNN’s prediction.

The efforts towards developing tools for explanation of DNNs are still in their infancy and are rapidly growing; challenges still abound towards a fully explainable systems in biology. The key problem is that the current general purpose methods to explain DL models are not sufficient especially in the clinical settings 132 . For the scientist and clinicians to trust these black box models they need to be able to explain themselves in a human-understandable fashion with quantifiable level of uncertainty, summarize the reasons for their behaviours, and suggest the additional steps (e.g., experiments, clinical studies, etc.) required to be able to reliably defend their decisions. We speculate that the new generation of explainable methods focus on helping these black-box models to transition from hypothesis generation machines into hypothesis testing ones which can communicate easier with medical practitioners.

Training efficiency

Despite the high accuracy of many DL approaches, their performance often comes at a high monetary and computational cost. For example, the monetary cost of consumed power and computation time is estimated to be up to hundreds of thousands of US dollars to train a single model 133 . The extreme costs of large DL models can prevent broader research community from reproducing and improving upon the current results. Thus, it is practical to consider lower-cost alternatives that are available and feasible for researchers with more modest resources. These issues are relevant for applying DL to computational biology. For instance, training the state-of-the-art protein structure prediction model AlphaFold2 requires computational resources equivalent to 100–200 GPUs running for a few weeks 21 . In the following paragraphs, we discuss common strategies utilized by the DL community to decrease the memory and computation cost in training, and potential directions for applying similar strategies to improve the efficiency of DL models in computational biology.

The most direct method of reducing the training cost of a DL method is to perform transfer learning on the available pretrained general model, instead of training the new model from scratch. It’s a common approach in training DL models for NLP tasks, and it has been shown that general language knowledge models are a good starting point for various different NLP tasks 134 . This approach can be adopted in computational biology, if all downstream tasks can start with a general model on biological data. For example, Zaheer et al. 135 trained a general human DNA sequence model based on human reference genome GRCh37, with self-supervised learning (masked DNA sequence prediction and next DNA sequence segment prediction). Subsequently, they have shown successful downstream task (Promoter Region Prediction) performance by solely applying transfer learning on the general model. Using pretrained models largely decreases (i) the size of task-specific datasets needed for training; and (ii) the total amount of local training needed for certain tasks that researchers are interested in. Thus creating general models that can be shared and used by the entire research community will greatly reduce the resources needed for training models on specific tasks by individual research groups. However, this approach will be less useful if the data distribution for different downstream tasks is drastically different compared with the data used by the general pretrained model. For instance, DeepVariant has limited applicability to non-human SNV calling due to the differences between diploid and haploid genomes, and nucleic acid distributions 4 . In these cases, we still need to train from scratch or spend a significant amount of resources on re-training the base model.

An alternative approach is to design DL model architectures with improved efficiency. As one of the most widely-studied architectures in DL, numerous low-cost variants of CNNs have been proposed. Some popular examples of efficient CNN architectures include the MobileNet family 136 , DenseNet 137 , EfficientNet 138 , and CSPNet 139 . Similarly, numerous efficiency-based architectural modifications have been proposed for the transformer model, many of which aim to reduce the quadratic computational complexity incurred by the self-attention mechanism 140 . Additionally, some transformer architectural variants explore the use of parameter sharing and factorization to reduce the memory cost of model training 141 . Going further, efficient architectural variants have been discovered for RNNs 142 and graph neural networks (GNNs) 143 , 144 , including specialized architectures that are tuned for better efficiency within the biological domain 145 .

For computational biology applications, one approach for boosting efficiency relies on exploiting inherent sparsity and locality of biological data (e.g. focusing only on the SNV calls rather than the whole genome 146 ). Researchers are also using transformers for DNA/RNA sequence modeling 135 , but transformer models have high training costs due to the expensive global attention mechanism. Prior domain expertise can be leveraged here to help prune attention neighborhoods, and subsequently improve training efficiency of the models. Finally, one can also change the model’s architecture during training to adaptively improve the training efficiency. The practice of model pruning, which removes unimportant parameters from the model, has become a popular method of deriving lightweight DL models 147 in deployment.

As the amount of biological data keeps increasing, the size of the neural networks will increase as well, and lead to a higher total number of training iterations required for convergence. Therefore it’s natural to explore dataset reduction strategies as one of solutions to the efficiency challenge. One feasible proposal is to construct coresets of the training dataset 148 . This can be done by using clustering methods on the dataset and choosing centroids as the representatives of the dataset. Alternatively, dataset condensation can be achieved by selecting the data samples that can best approximate the effect of training the model on the whole dataset. An orthogonal way of solving the high training cost problem for DL is to distribute the training on several cheap low-end devices. This step will decrease the total training time by distributing training, and decrease the total budget by using multiple cheap devices with less computation power. In general, the major distributed training methods are data parallelism, model parallelism and hybrid parallel training. Data parallel training splits and distributes parts of the dataset to each device 149 , where model parallel training splits and distributes parts of the model to each device 150 . As all above methods are task-agnostic, they can be readily applied to DL models for computational biology.

Concluding comments

In summary, while the success of DL in areas such as protein structure prediction is paradigm shifting, other areas such as function prediction, genome engineering, and multi-omics are also observing rapid gains in performance compared to traditional approaches. For other areas such as phylogenetics, classical computational approaches seem to have the upper hand in those areas. Additional advances specific to DL applied to challenges across the biosciences will further leverage domain-specific biological knowledge while striving for high explainability and improved efficiency.

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436–444 (2015).

Article   CAS   ADS   PubMed   Google Scholar  

Whalen, S., Schreiber, J., Noble, W. S. & Pollard, K. S. Navigating the pitfalls of applying machine learning in genomics. Nat. Rev. Genet. 23 , 169–181 (2022).

Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).

Arango-Argoty, G. et al. Deeparg: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. Microbiome 6 , 1–15 (2018).

Article   Google Scholar  

Nissen, J. N. et al. Improved metagenome binning and assembly using deep variational autoencoders. Nat. Biotechnol. 39 , 555–560 (2021).

Article   CAS   PubMed   Google Scholar  

Nielsen, A. A. & Voigt, C. A. Deep learning to predict the lab-of-origin of engineered DNA. Nat. Commun. 9 , 1–10 (2018).

Article   CAS   Google Scholar  

Pearce, R. & Zhang, Y. Toward the solution of the protein structure prediction problem. J. Biol. Chem. 297, 100870 (2021).

AlQuraishi, M. Machine learning in protein structure prediction. Curr. Opin. Chem. Biol. 65 , 1–8 (2021).

de Chadarevian, S. John Kendrew and myoglobin: Protein structure determination in the 1950s. Protein Sci. 27 , 1136–1143 (2018).

Article   PubMed   PubMed Central   Google Scholar  

Stollar, E. J. & Smith, D. P. Uncovering protein structure. Essays Biochem. 64 , 649–680 (2020).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Srivastava, A., Nagai, T., Srivastava, A., Miyashita, O. & Tama, F. Role of computational methods in going beyond X-ray crystallography to explore protein structure and dynamics. Int. J. Mol. Sci. 19 , 3401 (2018).

Article   PubMed Central   Google Scholar  

Wang, S., Sun, S., Li, Z., Zhang, R. & Xu, J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Computational Biol. 13 , e1005324 (2017).

Article   ADS   Google Scholar  

Zheng, W. et al. Deep-learning contact-map guided protein structure prediction in CASP13. Proteins: Struct. Funct. Bioinforma. 87 , 1149–1164 (2019).

Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 373 , 871–876 (2021).

Mirabello, C. & Wallner, B. RAWMSA: End-to-end deep learning using raw multiple sequence alignments. PloS One 14 , e0220182 (2019).

Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature https://doi.org/10.1038/s41586-021-03828-1 (2021).

AlQuraishi, M. AlphaFold at CASP13. Bioinformatics. 35 , 4862–4865 (2019).

Ingraham, J., Riesselman, A., Sander, C. & Marks, D. Learning protein structure with a differentiable simulator. In International Conference on Learning Representations (2018).

AlQuraishi, M. End-to-end differentiable learning of protein structure. Cell Syst. 8 , 292–301 (2019).

Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577 , 706–710 (2020).

Article   CAS   PubMed   ADS   Google Scholar  

Berman, H. M. et al. The protein data bank. Nucleic Acids Res. 28 , 235–242 (2000).

Article   CAS   PubMed   PubMed Central   ADS   Google Scholar  

Madeira, F. et al. The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res. 47 , W636–W641 (2019).

Bonetta, R. & Valentino, G. Machine learning techniques for protein function prediction. Proteins: Struct. Funct. Bioinforma. 88 , 397–413 (2020).

Huntley, R. P. et al. The GOA database: gene ontology annotation updates for 2015. Nucleic Acids Res. 43 , D1057–D1063 (2015).

Zhang, M.-L. & Zhou, Z.-H. A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26 , 1819–1837 (2013).

Consortium, U. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47 , D506–D515 (2019).

Szklarczyk, D. et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47 , D607–D613 (2019).

Mistry, J. et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 49 , D412–D419 (2021).

Kulmanov, M., Khan, M. A. & Hoehndorf, R. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics. 34 , 660–668 (2018).

Alshahrani, M. et al. Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics. 33 , 2723–2730 (2017).

Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215 , 403–410 (1990).

Kulmanov, M. & Hoehndorf, R. DeepGOPlus: improved protein function prediction from sequence. Bioinformatics. 36 , 422–429 (2020).

CAS   PubMed   Google Scholar  

Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods. 12 , 59–60 (2015).

Chicco, D., Sadowski, P. & Baldi, P. Deep autoencoder neural networks for gene ontology annotation predictions. In Proceedings of the 5th ACM Conference On Bioinformatics, Computational Biology, and Health Informatics , 533–540 (2014).

Miranda, L. J. & Hu, J. A deep learning approach based on stacked denoising autoencoders for protein function prediction. In 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC ), vol. 1, 480–485 (IEEE, 2018).

Gligorijević, V., Barot, M. & Bonneau, R. deepNF: deep network fusion for protein function prediction. Bioinformatics. 34 , 3873–3881 (2018).

Zou, Z., Tian, S., Gao, X. & Li, Y. mlDEEpre: multi-functional enzyme function prediction with hierarchical multi-label deep learning. Front. Genet. 9 , 714 (2019).

Li, S. et al. Deep learning-based prediction of species-specific protein S-glutathionylation sites. Biochim. Biophys. Acta Proteins Proteom. 1868 , 140422 (2020).

Mazurenko, S., Prokop, Z. & Damborsky, J. Machine learning in enzyme engineering. ACS Catal. 10 , 1210–1223 (2019).

Zhang, F. et al. Deepfunc: a deep learning framework for accurate prediction of protein functions from protein sequences and interactions. Proteomics. 19 , 1900019 (2019).

Aghazadeh, A. et al. Epistatic net allows the sparse spectral regularization of deep neural networks for inferring fitness functions. Nat. Commun. 12 , 1–10 (2021).

Brookes, D. H., Aghazadeh, A. & Listgarten, J. On the sparsity of fitness functions and implications for learning. In Proceedings of the National Academy of Sciences 119 (2022). https://www.pnas.org/content/119/1/e2109649118 .

van Overbeek, M. et al. DNA repair profiling reveals nonrandom outcomes at Cas9-mediated breaks. Mol. Cell. 63 , 633–646 (2016).

Article   PubMed   Google Scholar  

Cui, Y., Xu, J., Cheng, M., Liao, X. & Peng, S. Review of CRISPR/Cas9 sgRNA design tools. Interdiscip. Sci. Computational Life Sci. 10 , 455–465 (2018).

Xue, L., Tang, B., Chen, W. & Luo, J. Prediction of CRISPR sgRNA activity using a deep convolutional neural network. J. Chem. Inf. Modeling. 59 , 615–624 (2018).

Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning–based model with high generalization performance. Sci. Adv. 5 , eaax9249 (2019).

Chuai, G. et al. DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome Biol. 19 , 1–18 (2018).

Liu, Q., Cheng, X., Liu, G., Li, B. & Liu, X. Deep learning improves the ability of sgRNA off-target propensity prediction. BMC Bioinforma. 21 , 1–15 (2020).

Wang, D. et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning. Nat. Commun. 10 , 1–14 (2019).

ADS   Google Scholar  

Liu, Q., He, D. & Xie, L. Prediction of off-target specificity and cell-specific fitness of CRISPR-Cas system using attention boosted deep learning and network-based gene feature. PLoS Computational Biol. 15 , e1007480 (2019).

Article   CAS   ADS   Google Scholar  

Zhang, G., Zeng, T., Dai, Z. & Dai, X. Prediction of CRISPR/Cas9 single guide RNA cleavage efficiency and specificity by attention-based convolutional neural networks. Computational Struct. Biotechnol. J. 19 , 1445–1457 (2021).

Metsky, H. C. et al. Designing sensitive viral diagnostics with machine learning. Nat. Biotechnol. https://doi.org/10.1038/s41587-022-01213-5 (2022).

Leenay, R. T. et al. Large dataset enables prediction of repair after CRISPR–Cas9 editing in primary T cells. Nat. Biotechnol. 37 , 1034–1037 (2019).

Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. 37 , 64–72 (2019).

Shen, M. W. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature . 563 , 646–651 (2018).

Li, V. R., Zhang, Z. & Troyanskaya, O. G. CROTON: an automated and variant-aware deep learning framework for predicting CRISPR/Cas9 editing outcomes. Bioinformatics. 37 , i342–i348 (2021).

Kim, H. K. et al. Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity. Nat. Biotechnol. 36 , 239 (2018).

Song, M. et al. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nat. Biotechnol. 38 , 1037–1043 (2020).

Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39 , 198–206 (2021).

Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome editing with CRISPR–Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38 , 824–844 (2020).

Aghazadeh, A., Ocal, O. & Ramchandran, K. CRISPRLand: Interpretable large-scale inference of DNA repair landscape based on a spectral approach. Bioinformatics. 36 , i560–i568 (2020).

Sun, B., Feng, J. & Saenko, K. Return of frustratingly easy domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016).

Angermueller, C., Pärnamaa, T., Parts, L. & Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 12 , 878 (2016).

NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 46 , D8-D13 (2018).

Athar, A. et al. ArrayExpress update - from bulk to single-cell expression data. Nucleic Acids Res. 47 , D711–D715 (2019).

Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47 , D442–D450 (2019).

Grossman, R. L. et al. Toward a shared vision for cancer genomic data. N. Engl. J. Med. 375 , 1109–1112 (2016).

Barretina, J. et al. The Cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483 , 603–607 (2012).

Lonsdale, J. et al. The genotype-tissue expression (GTEx) project. Nat. Genet. 45 , 580–585 (2013).

Consortium, E. P. et al. The ENCODE (ENCyclopedia of DNA elements) project. Science. 306 , 636–640 (2004).

Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 8 , 118–127 (2007).

Article   PubMed   MATH   Google Scholar  

Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43 , e47–e47 (2015).

Leek, J. T. Svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Res. 42 , e161–e161 (2014).

Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32 , 896–902 (2014).

Zhu, Q. et al. Targeted exploration and analysis of large cross-platform human transcriptomic compendia. Nat. Methods. 12 , 211–214 (2015).

Wong, A. K., Krishnan, A. & Troyanskaya, O. G. GIANT 2.0: genome-scale integrated analysis of gene networks in tissues. Nucleic Acids Res. 46 , W65–W70 (2018).

Yao, V. et al. An integrative tissue-network approach to identify and test human disease genes. Nat. Biotechnol. 36 , 1091–1099 (2018).

Ellis, S. E., Collado-Torres, L., Jaffe, A. & Leek, J. T. Improving the value of public RNA-seq expression data by phenotype prediction. Nucleic Acids Res. 46 , e54 (2018).

Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37 , 685–691 (2019).

Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36 , 411–420 (2018).

Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods. 16 , 1289–1296 (2019).

Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods. 15 , 1053–1058 (2018).

Amodio, M. et al. Exploring single-cell data with deep multitasking neural networks. Nat. Methods 16 , 1139–1145 (2019).

Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S. & Theis, F. J. Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun. 10 , 1–14 (2019).

Angermueller, C., Lee, H. J., Reik, W. & Stegle, O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 18 , 1–13 (2017).

Google Scholar  

Xiong, L. et al. SCALE method for single-cell ATAC-seq analysis via latent feature extraction. Nat. Commun. 10 , 1–10 (2019).

Ritchie, M. D., Holzinger, E. R., Li, R., Pendergrass, S. A. & Kim, D. Methods of integrating data to uncover genotype-phenotype interactions. Nat. Rev. Genet. 16 , 85–97 (2015).

Wang, H., Lengerich, B. J., Aragam, B. & Xing, E. P. Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data. Bioinformatics. 35 , 1181–1187 (2019).

Li, Z., Chang, C., Kundu, S. & Long, Q. Bayesian generalized biclustering analysis via adaptive structured shrinkage. Biostatistics. 21 , 610–624 (2020).

Article   MathSciNet   PubMed   Google Scholar  

Argelaguet, R. et al. Multi-omics factor analysis – a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol. 14 , e8124 (2018).

Chaudhary, K., Poirion, O. B., Lu, L. & Garmire, L. X. Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clin. Cancer Res. 24 , 1248–1259 (2018).

Tong, L., Mitchel, J., Chatlin, K. & Wang, M. D. Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis. BMC Med. Inform. Decis. Mak. 20 , 1–12 (2020).

Stuart, T. et al. Comprehensive integration of single-cell data. Cell. 177 , 1888–1902 (2019).

Mariette, J. & Villa-Vialaneix, N. Unsupervised multiple kernel learning for heterogeneous data integration. Bioinformatics. 34 , 1009–1015 (2018).

Welch, J. D., Hartemink, A. J. & Prins, J. F. MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics. Genome Biol. 18 , 1–19 (2017).

Wu, K. E., Yost, K. E., Chang, H. Y. & Zou, J. BABEL enables cross-modality translation between multiomic profiles at single-cell resolution. Proc Natl Acad. Sci. 118, e2023070118 (2021).

Buggenthin, F. et al. Prospective identification of hematopoietic lineage choice by deep learning. Nat. Methods. 14 , 403–406 (2017).

Sharifi-Noghabi, H., Zolotareva, O., Collins, C. C. & Ester, M. MOLI: multi-omics late integration with deep neural networks for drug response prediction. Bioinforma. 35 , i501–i509 (2019).

Ma, T. & Zhang, A. multi-view factorization autoencoder with network constraints for multi-omic integrative analysis. In 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) , 702-707 (IEEE, 2018).

Lee, G., Nho, K., Kang, B., Sohn, K.-A. & Kim, D. Predicting Alzheimer’s disease progression using multi-modal deep learning approach. Sci. Rep. 9 , 1952 (2019).

Article   PubMed   PubMed Central   ADS   Google Scholar  

Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods. 11 , 333–337 (2014).

Keilwagen, J., Posch, S. & Grau, J. Accurate prediction of cell type-specific transcription factor binding. Genome Biol. 20 , 1–17 (2019).

Zollner, S. & Pritchard, J. K. Overcoming the winner’s curse: estimating penetrance parameters from case-control data. Am. J. Hum. Genet. 80 , 605–615 (2007).

Beery, A. K. & Zucker, I. Sex bias in neuroscience and biomedical research. Neurosci. Biobehav. Rev. 35 , 565–572 (2011).

Carlson, C. S. et al. Generalization and dilution of association results from European GWAS in populations of non-European ancestry: the PAGE study. PLoS Biol. 11 , e1001661 (2013).

Felsenstein, J. Inferring Phylogenies , vol. 2 (Sinauer Associates Sunderland, MA, 2004).

Nute, M., Saleh, E. & Warnow, T. Evaluating statistical multiple sequence alignment in comparison to other alignment methods on protein data sets. Syst. Biol. 68 , 396–411 (2018).

Nakhleh, L. In Problem solving handbook in computational biology and bioinformatics , 125–158 (Springer, 2010).

Dopazo, J. & Carazo, J. M. Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. J. Mol. evolution. 44 , 226–233 (1997).

Suvorov, A., Hochuli, J. & Schrider, D. R. Accurate inference of tree topologies from multiple sequence alignments using deep learning. Syst. Biol. 69 , 221–233 (2020).

Zou, Z., Zhang, H., Guan, Y. & Zhang, J. Deep residual neural networks resolve quartet molecular phylogenies. Mol. Biol. Evolution 37 , 1495–1507 (2020).

Zaharias, P., Grosshauser, M. & Warnow, T. Re-evaluating deep neural networks for phylogeny estimation: the issue of taxon sampling J Comput Biol 29 , 74-89 (2021).

Jiang, Y., Balaban, M., Zhu, Q. & Mirarab, S. DEPP: Deep learning enables extending species trees using single genes. https://doi.org/10.1101/2021.01.22.427808 (2021).

Bhattacharjee, A. & Bayzid, M. S. Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices. BMC genomics. 21 , 1–14 (2020).

Linial, N., London, E. & Rabinovich, Y. The geometry of graphs and some of its algorithmic applications. Combinatorica. 15 , 215–245 (1995).

Article   MathSciNet   MATH   Google Scholar  

Nickel, M. & Kiela, D. Poincaré embeddings for learning hierarchical representations. Adv. Neural Inf. Process. Syst. 30 , 6338–6347 (2017).

Abadi, S., Avram, O., Rosset, S., Pupko, T. & Mayrose, I. ModelTeller: model selection for optimal phylogenetic reconstruction using machine learning. Mol. Biol. Evolution 37 , 3338–3352 (2020).

Kumar, S. & Sharma, S. Evolutionary sparse learning for phylogenomics. Mol. Biol. Evolution. 38 , 4674–4682 (2021).

Azer, E. S., Ebrahimabadi, M. H., Malikić, S., Khardon, R. & Sahinalp, S. C. Tumor phylogeny topology inference via deep learning. iScience. 23 , 101655 (2020).

Bello, I., Pham, H., Le, Q. V., Norouzi, M. & Bengio, S. Neural combinatorial optimization with reinforcement learning. In Workshop at International Conference on Learning Representations, ICLR’17. (2017).

Montavon, G., Samek, W. & Müller, K.-R. Methods for interpreting and understanding deep neural networks. Digital Signal Process. 73 , 1–15 (2018).

Article   MathSciNet   Google Scholar  

Berkes, P. & Wiskott, L. On the analysis and interpretation of inhomogeneous quadratic forms as receptive fields. Neural Comput. 18 , 1868–1895 (2006).

Article   MathSciNet   PubMed   MATH   Google Scholar  

Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. In Workshop at International Conference on Learning Representations (2014).

Bach, S. et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS One 10 , e0130140 (2015).

Zurada, J. M., Malinowski, A. & Cloete, I. Sensitivity analysis for minimization of input data dimension for feedforward neural network. In Proceedings of IEEE International Symposium on Circuits and Systems-ISCAS’94 , vol. 6, 447–450 (IEEE, 1994).

Khan, J. et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7 , 673–679 (2001).

Ribeiro, M. T., Singh, S. & Guestrin, C. "Why should I trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 1135–1144 (2016).

Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In International Conference on Machine Learning, 3145–3153 (PMLR, 2017).

Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems , 4768–4777 (2017).

Ying, R., Bourgeois, D., You, J., Zitnik, M. & Leskovec, J. GNNExplainer: Generating explanations for graph neural networks. Adv. Neural Inf. Process. Syst. 32 , 9240 (2019).

PubMed   PubMed Central   Google Scholar  

Gilpin, L. H. et al. Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) , 80–89 (IEEE, 2018).

Yang, Z. et al. Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 32 (2019).

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 1 (Long and Short Papers), 4171-4186 (Association for Computational Linguistics, Minneapolis, Minnesota, 2019). https://aclanthology.org/N19-1423 .

Zaheer, M. et al. Big bird: Transformers for longer sequences. In Advances in Neural Information Processing Systems ( NeurIPS ), 33 , 17283–17297 (2020).

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 4510–4520 (2018).

Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition , 4700–4708 (2017).

Tan, M. & Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning, 6105-6114 (PMLR, 2019).

Wang, C.-Y. et al. Cspnet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 390–391 (2020).

Wu, Z., Liu, Z., Lin, J., Lin, Y. & Han, S. Lite transformer with long-short range attention. In International Conference on Learning Representations ( 2019).

Lan, Z. et al. ALBERT: A lite BERT for self-supervised learning of language representations. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020 (2020).

Kusupati, A. et al. Fastgrnn: A fast, accurate, stable and tiny kilobyte sized gated recurrent neural network. Adv. Neural Inf. Process. Syst. 31 , 9031–9042 (2018).

Chiang, W.-L. et al. Cluster-GCN: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , 257–266 (2019).

Zou, D. et al. Layer-dependent importance sampling for training deep and large graph convolutional networks. Adv. Neural Inf. Process. Syst. 32 , 11249–11259 (2019).

Pouladi, F., Salehinejad, H. & Gilani, A. M. Recurrent neural networks for sequential phenotype prediction in genomics. In 2015 International Conference on Developments of E-Systems Engineering (DeSE) , 225–230 (IEEE, 2015).

Ke, Z. & Vikalo, H. A convolutional auto-encoder for haplotype assembly and viral quasispecies reconstruction. Adv. Neural Inf. Process. Syst. 33 , 13493–13503 (2020).

Liu, Z., Sun, M., Zhou, T., Huang, G. & Darrell, T. Rethinking the value of network pruning. In International Conference on Learning Representations (2018).

Mirzasoleiman, B., Bilmes, J. & Leskovec, J. Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning , 6950–6960 (PMLR, 2020).

Lin, T., Stich, S. U., Patel, K. K. & Jaggi, M. Don’t use large mini-batches, use local SGD. In International Conference on Learning Representations (2019).

Geng, J., Li, D. & Wang, S. Elasticpipe: An efficient and dynamic model-parallel solution to DNN training. In Proceedings of the 10th Workshop on Scientific Cloud Computing , 5–9 (2019).

Download references

Acknowledgements

A.A. is supported by the ARO (W911NF2110117). R.B. and CJ.B. are supported by NSF grants CCF-1911094, IIS-1838177, and IIS-1730574; ONR grants N00014-18-12571, N00014-20-1-2534, and MURI N00014-20-1-2787; AFOSR grant FA9550-18-1-0478; and a Vannevar Bush Faculty Fellowship, ONR grant N00014-18-1-2047. M.N and R.A.L.E. are supported by a training fellowship from the Gulf Coast Consortia, on the NLM Training Program in Biomedical Informatics & Data Science (T15LM007093). D.A.A. is partially supported by funds from the University of Houston. A.B., B.K., N.S., and T.J.T are partially supported by funds from the FunGCAT program from the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via the Army Research Office (ARO) under Federal Award No. W911NF-17-2-0089. T.J.T is supported by NIH grant P01AI152999 and by NSF grant EF-212638. B.K. is supported by a fellowship from the National Library of Medicine Training Program in Biomedical Informatics and Data Science (5T15LM007093-30, PI: Kavraki). Z.Y., M.E., and L.N. are supported by NSF grants DBI-2030604 and IIS-2106837. R.D. and V.Y. are supported by Cancer Prevention & Research Institute of Texas (CPRIT) Award (RR190065). V.Y. is a CPRIT Scholar in Cancer Research and also supported by NIH grant RF1AG054564. A.K. is supported by NSF grants CCF-1907936, CNS-2003137.

Author information

These authors contributed equally: Nicolae Sapoval, Amirali Aghazadeh.

Authors and Affiliations

Department of Computer Science, Rice University, Houston, TX, USA

Nicolae Sapoval, Michael G. Nute, Advait Balaji, Ruth Dannenfelser, Chen Dun, Mohammadamin Edrisi, R. A. Leo Elworth, Bryce Kille, Anastasios Kyrillidis, Luay Nakhleh, Cameron R. Wolfe, Zhi Yan, Vicky Yao & Todd J. Treangen

Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA

Amirali Aghazadeh

Department of Biology and Biochemistry, University of Houston, Houston, TX, USA

Dinler A. Antunes

Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA

Richard Baraniuk & C. J. Barberan

Department of Bioengineering, Rice University, Houston, TX, USA

Todd J. Treangen

You can also search for this author in PubMed   Google Scholar

Contributions

N.S. and A.A. designed figures and conceptualized the manuscript. N.S., A.A., R.A.L.E, and B.K. contributed text to the introduction and general challenges for deep learning in the biosciences sections. A.A. contributed text for the genome engineering section. D.A.A. contributed text for the protein structure prediction section. A.B. contributed text for the protein function prediction section. R.B., C.J.B., and A.A. contributed text for the explainability section. C.D., C.R.W., and A.K. contributed text for the training efficiency section. R.D. and V.Y. contributed text for the systems biology and data integration section. M.E., M.G.N., L.N., and Z.Y. contributed text and figures for the phylogenetics section. T.J.T. supervised the work and contributed to manuscript conceptualization. All authors have edited and reviewed the manuscript.

Corresponding author

Correspondence to Todd J. Treangen .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Bharath Ramsundar, Aurelien Tellier and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Sapoval, N., Aghazadeh, A., Nute, M.G. et al. Current progress and open challenges for applying deep learning across the biosciences. Nat Commun 13 , 1728 (2022). https://doi.org/10.1038/s41467-022-29268-7

Download citation

Received : 25 August 2021

Accepted : 09 March 2022

Published : 01 April 2022

DOI : https://doi.org/10.1038/s41467-022-29268-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Harnessing deep learning for population genetic inference.

  • Aigerim Rymbekova
  • Martin Kuhlwilm

Nature Reviews Genetics (2024)

Data encoding for healthcare data democratization and information leakage prevention

  • Anshul Thakur
  • Tingting Zhu
  • David A. Clifton

Nature Communications (2024)

Strategies to increase the robustness of microbial cell factories

  • Nuo-Qiao Lin
  • Jian-Zhong Liu

Advanced Biotechnology (2024)

Classification and detection of natural disasters using machine learning and deep learning techniques: A review

  • Kibitok Abraham
  • Moataz Abdelwahab
  • Mohammed Abo-Zahhad

Earth Science Informatics (2024)

Large sample size and nonlinear sparse models outline epistatic effects in inflammatory bowel disease

  • Nora Verplaetse
  • Antoine Passemiers
  • Daniele Raimondi

Genome Biology (2023)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

current research problems in machine learning

IMAGES

  1. Different types of Machine Learning Problems

    current research problems in machine learning

  2. The Machine Learning Workflow Explained (and How You Can Practice It

    current research problems in machine learning

  3. Introduction to Machine Learning (ML)

    current research problems in machine learning

  4. Machine Learning 101 Complete Course

    current research problems in machine learning

  5. List of Top 5 Powerful Machine Learning Algorithms

    current research problems in machine learning

  6. Machine Learning: Solving Real World Problems

    current research problems in machine learning

VIDEO

  1. Problem in CNN

  2. Extreme Learning Machine: Learning Without Iterative Tuning

  3. 14 Types of Machine Learning Problems

  4. Practice Problems: Machine Learning Workshop

  5. Azure Machine Learning

  6. INQA Conference 2023: Takashi Imoto, AIST, Japan

COMMENTS

  1. Machine learning

    Machine learning is the ability of a machine to improve its performance based on previous results. Machine learning methods enable computers to learn without being explicitly programmed and have ...

  2. Machine Learning: Algorithms, Real-World Applications and Research

    In the current age of the Fourth Industrial Revolution (4IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated applications, the knowledge of artificial intelligence (AI ...

  3. Advances, challenges, and future research needs in machine learning

    Note that Stages 1-3 (mapped to 3 A review of machine learning models, 4 A review of modelling intricacies) provide a review of existing studies, whereas Section 5 provides future research directions based on the thorough review presented in 3 A review of machine learning models, 4 A review of modelling intricacies. This classification makes ...

  4. Deep learning: systematic review, models, challenges, and research

    The current development in deep learning is witnessing an exponential transition into automation applications. This automation transition can provide a promising framework for higher performance and lower complexity. This ongoing transition undergoes several rapid changes, resulting in the processing of the data by several studies, while it may lead to time-consuming and costly models. Thus ...

  5. Machine learning: Trends, perspectives, and prospects

    A diverse array of machine-learning algorithms has been developed to cover the wide variety of data and problem types exhibited across different machine-learning problems (1, 2).Conceptually, machine-learning algorithms can be viewed as searching through a large space of candidate programs, guided by training experience, to find a program that optimizes the performance metric.

  6. Advancements and Challenges in Machine Learning: A Comprehensive Review

    The following are the primary contributions of this paper: (1) an in-depth analysis of the current machine learning technique that is being applied to solve a wide range of classification, regression, and clustering issues. (2) The machine learning engineer can understand the reasoning behind all machine learning algorithms by reading this paper.

  7. Could machine learning fuel a reproducibility crisis in science?

    The pair analysed 20 reviews in 17 research fields, and counted 329 research papers whose results could not be fully replicated because of problems in how machine learning was applied 1.

  8. Current Advances, Trends and Challenges of Machine Learning and

    Another field that requires more research is the intersection between security (and especially privacy related) research and ML - be it in the form of privacy aware machine learning, where the distortion from data protection mechanisms is mitigated, or rather in the areas of protecting ownership on information or providing trust into the ...

  9. 1 Physics-Informed Machine Learning: A Survey on Problems, Methods and

    Physics-Informed Machine Learning: A Survey on Problems, Methods and Applications Zhongkai Hao, Songming Liu, Yichi Zhang, Chengyang Ying, Yao Feng, Hang Su, Jun Zhu ... We also propose several important open research problems based on the current trends in the field. We argue that encoding different forms of physical prior into model ...

  10. Machine learning challenges and impact: an interview with Thomas

    The second major research problem for machine learning is the problem of verification, validation and trust. Traditional software systems often contain bugs, but because software engineers can read the program code, they can design good tests to check that the software is working correctly. ... data might be collected from current customers ...

  11. [2109.13916] Unsolved Problems in ML Safety

    Unsolved Problems in ML Safety. Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt. Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority.

  12. Machine learning methods in finance: Recent applications and prospects

    This paper addresses the use of ML to solve problems in finance research. Several overview papers indicate the potential of ML in finance. Varian ( 2014 ) describes ML as an appropriate tool in the economic analysis of big data and presents some ML methods with examples in economics.

  13. Machine learning-based approach: global trends, research directions

    Artificial intelligence (AI), and in particular, Machine Learning (ML), have progressed remarkably in recent years as key instruments to intelligently analyze such data and to develop the corresponding real-world applications (Koteluk et al., 2021; Sarker, 2021b).For instance, ML has emerged as the method of choice for developing practical software for computer vision, speech recognition, and ...

  14. Here are the Most Common Problems Being Solved by Machine Learning

    Compensate for missing data. Gaps in a data set can severely limit accurate learning, inference, and prediction. Models trained by machine learning improve with more relevant data. When used correctly, machine learning can also help synthesize missing data that round out incomplete datasets. Make more accurate predictions or conclusions from ...

  15. Current Research Questions in Machine Learning

    Here are some current research questions / problems in Machine Learning that are required still need to do more work on these: Can unlabeled data be helpful for supervised learning? e.g., learning to classify webpages or spam ; How can we transfer what is learned for one task to improve learning in other related tasks? (Transfer Learning)

  16. machine learning Latest Research Papers

    Find the latest published documents for machine learning, Related hot topics, top authors, the most cited documents, and related journals ... and complexity of records and greater levels of disorganised data. Current practices for selecting records for transfer to The National Archives (TNA) were developed to deal with paper records and are ...

  17. Science has an AI problem: This group says they can fix it

    Jan. 5, 2021 — A research team has successfully used machine learning -- computer algorithms that improve themselves by learning patterns in data -- to complete cumbersome materials science ...

  18. Too many AI researchers think real-world problems are not relevant

    To quote a classic paper titled "Machine Learning that Matters" (pdf), by NASA computer scientist Kiri Wagstaf f: "Much of current machine learning research has lost its connection to ...

  19. On the Generalizability of Machine Learning Classification ...

    The use of machine learning algorithms in healthcare can amplify social injustices and health inequities. While the exacerbation of biases can occur and be compounded during problem selection, data collection, and outcome definition, this research pertains to the generalizability impediments that occur during the development and post-deployment of machine learning classification algorithms.

  20. The rewards of reusable machine learning code

    Research efforts that involve well-designed machine learning tools can be of long-lasting value to the research community and beyond, provided that the methods, datasets and code are clearly ...

  21. Science has an AI problem. This group says they can fix it

    AI holds the potential to help doctors find early markers of disease and policymakers to avoid decisions that lead to war. But a growing body of evidence has revealed deep flaws in how machine learning is used in science, a problem that has swept through dozens of fields and implicated thousands of erroneous papers.

  22. GANsemble for Small and Imbalanced Data Sets: A Baseline for Synthetic

    Microplastic particle ingestion or inhalation by humans is a problem of growing concern. Unfortunately, current research methods that use machine learning to understand their potential harms are obstructed by a lack of available data. Deep learning techniques in particular are challenged by such domains where only small or imbalanced data sets are available. Overcoming this challenge often ...

  23. The Opportunities and Challenges of Machine Learning in Conducting

    Clinical trial research is experiencing a groundbreaking revolution. Trial data has always been essential for testing drugs, devices and treatment protocols. Common trial challenges such as timelines, trial costs and compliance have traditionally impacted the amount of data that was collected. Now, thanks to artificial intelligence (AI) and machine learning (ML), researchers can access, manage ...

  24. Machine Learning Optimization Techniques: A Survey, Classification

    Optimization approaches in machine learning (ML) are essential for training models to obtain high performance across numerous domains. The article provides a comprehensive overview of ML optimization strategies, emphasizing their classification, obstacles, and potential areas for further study. We proceed with studying the historical progression of optimization methods, emphasizing significant ...

  25. PDF Solving Machine Learning Problems

    homework, and quiz questions from MIT's 6.036 Introduction to Machine Learning course and train a machine learning model to answer these questions. Our system demonstrates an overall accuracy of 96% for open-response questions and 97% for multiple-choice questions, compared with MIT students' average of 93%, achieving

  26. 7 Major Challenges Faced By Machine Learning Professionals

    1. Poor Quality of Data. Data plays a significant role in the machine learning process. One of the significant issues that machine learning professionals face is the absence of good quality data. Unclean and noisy data can make the whole process extremely exhausting. We don't want our algorithm to make inaccurate or faulty predictions.

  27. Science has an AI problem: Research group says they can fix it

    Because machine learning has been adopted across virtually every scientific discipline, with no universal standards safeguarding the integrity of those methods, Narayanan said the current crisis ...

  28. Current progress and open challenges for applying deep learning across

    Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper ...

  29. Machine-learning technique identifies people who would benefit most

    New UCLA research suggests that a novel machine-learning technique known as "causal forest" was about five times more efficient than the current clinical practice of treating patients with high blood pressure. Under current practice, physicians treat patients with high blood pressure under the assumption that people at greatest risk for adverse outcomes and death from a disease benefit the ...