Causal Reasoning Application in Smart Farming and Ethics: A Systematic Review

Shkurte Luma-Osmani, Florije Ismaili, Bujar Raufi and Xhemal Zenuni, "Causal Reasoning Application in Smart Farming and Ethics: A Systematic Review", Annals of Emerging Technologies in Computing (AETiC), Print ISSN: 2516-0281, Online ISSN: 2516-029X, pp. 10-19, Vol. 4, No. 4, 1st October 2020, Published by International Association of Educators and Researchers (IAER), DOI: 10.33166/AETiC.2020.04.002, Available: http://aetic.theiaer.org/archive/v4/v4n4/p2.html. Review Article


Introduction
In a world described by random variables, some of which may have causal inference on others, [1] discusses the underlying mathematical framework of causal inference through three fundamental concepts: causation, intervention, mechanisms. Apart from being a fundamental philosophical topic, causal reasoning can be studied and analyzed in almost all disciplines, out of which artificial intelligence in general and machine learning in particular become important in modelling and solving causal reasoning in data.
To authors' knowledge, this study represents the first systematic literature review of causal reasoning in general with a special focus on causal reasoning applied to smart farming and ethics. The main contribution of this study is the translation of the review into research questions which are answered in general then are supported by appropriate literature. The research question leads to mainly remembering that the human life and being ethical towards it is more important than the results of the research. The answer to the last question provides a deep understanding of causal reasoning application in Smart Farming domain, especially in crop management, livestock management, soil management and water management.
From this perspective the authors see the potential research paths of causal reasoning applied specifically in smart farming where a more targeted decision making can be done if a more methodical causal analysis would be conducted.

Literature Review Methodology
Recently, several approaches for literature review are presented. The study accessible in this research follows the methodology employed by Creswell's 5 steps method [2] as illustrated in figure 1. Each phase has its own outcomes, represented by figures, tables or charts. www.aetic.theiaer.org Step 1: In order to retrieve relevant literature, the first step of the review starts with the definition of main issues related to research ethics and causal reasoning in smart farming. It aims the ethical issues related to different researches and experiments, and focuses on causal reasoning in smart agriculture. Therefore, to define the search statements related to each issue, main concepts and keywords are extracted based on table 1, where double quotes are used to force exact match by performing Boolean search.

Main Concepts
Search Statement

Ethics in Research and Experiments
"Ethics" AND ("IT" OR "Data Mining" OR "Causality" OR "Association Rules" OR "Research" OR "Experiment") Causal Reasoning for Smart Farming "Data mining in Agriculture" OR "Cause-effect Relationships in Smart Farming" OR "Causal Rules in Agriculture" OR "Causal Reasoning for Smart Farming" Step 2: Once the key terms have been identified, the next phase consists of the search for relevant literature. According to [2] this step includes location of literature about a topic by consulting several types of materials and databases, including those available at an academic library and on the Internet. Based on this, articles from journals, conference proceedings, books or book chapters, reports and articles posted in electronic sources are included in searching process. Major computer sciences digital libraries were utilized, such as: IEEE Xplore, Google Scholar, ResearchGate, ACM Digital Library, Springer Open, DBLP and Elsevier. The percentage of used databases is presented on the following chart.

Figure 2. Digital Libraries Used
Step 3: Critically evaluate and select the literature for the review presents one of the most important phases in literature review when it is needed to determine whether it is relevant to the conducted research. Advanced search, using defined search statements, over the digital libraries listed above, resulted in 2574 publications. Based on the titles and abstract, duplicated papers, short papers, updated version papers were removed. Furthermore, based on introduction, main headings and conclusion, papers not supporting the main objective of the research where excluded as well.

•Identify Key Terms
Step 1 Step 1

•Locate Literature
Step 2 Step 2 •Critically Evaluate and Select the Literature

•Organize the Literature
Step 4 Step 4

•Write a Literature Review
Step 5 Step 5 12 www.aetic.theiaer.org Only 61 of these publications were selected for further examination. The entire process of the selection and evaluation of manuscripts is presented in figure 3.

Figure 3. Literature Evaluation and Selection Process
Step 4: Organizing the literature is the phase of storing and categorizing the relevant publications to perform further evaluation. It's preferred to be in a table format, so different types of sort criteria can be applied. For each publication, reference type, article type, domain, title, main contribution, future work etc. can be found online on https://www.seeu.edu.mk/en/~f.ismaili. Figure 4 presents the Literature Map of the most relevant publications to the study topic. The publications are grouped in those that elaborate on research ethics in general to proceed with causal reasoning in smart farming. Step 5: The final step consists of writing a literature review that summarizes reports and conclusions derived from the literature evaluation. The selected articles are reviewed and categorized based on a classification framework presented above.
The related work is conducted around two research questions raised: 1. What is the main challenge of conducting experiments because of ethical concerns? 2. What are causal reasoning methods and techniques for Smart Agriculture?

Research classification and evaluation process
The results of the systematic literature review are presented as answer to research questions.

RQ1: What is the main challenge of conducting experiments because of ethical concerns?
The data source plays an important role in the process of data mining. Ethical dilemmas are constantly faced as data mining evolves, and continues to evolve. Until recently, privacy protection and ethical warnings have received relatively little interest in mainstream KDD research. However, in the KDD the (ab)use of sensitive information is growing concern [3][4] [5].
Since causal discovery is primary based on studies that are conducted under experiments [6], ethics is an issue especially significant in this area. [3] assert that the method of producing unique mining rules turns out to be an ethical issue, mostly when the results are used in decision-making processes that affect people, or in any case when mining customer data in an innocent manner jeopardizes those customers' privacy. In the survey, they describe a process for evaluating a rule in terms of its perceived privacy and ethical sensitivity.
Ethical principles must be adhered in all research fields, especially in health care [7][8][9], however several ethical concerns are also risen in web mining area. It refers to the data mining and related techniques in order to automatically discover and extract information and discover useful patterns from web documents and services. It poses a threat to some important ethical values like privacy and individuality [10] which must be considered. Web mining makes it difficult for an individual to autonomously control the unveiling and dissemination of data about his/her private life. To study these threats, the paper distinguishes among two contexts, "web content and structure mining" and "usage mining". Web content and structure mining is a cause for concern when data published on the web is mined and combined with other data for use in a totally different context. Furthermore, there are a considerable number of databases that could be considered ethically sensitive [3], it is apparent particularly in areas such as medical and health research. Web usage mining raises privacy concerns when the web users without their knowledge are traced, and their actions are analyzed.
Cook, in his book chapter [11] acknowledges that data miners and decision-makers are clearly obligated to comply with the law, but ethical issues are often more stringent than what is legally required. According to him, it is a sad fact that a number of IS professionals either lack sufficient understanding about what their company is actually doing with results from data and data mining, or come to the realisation that it may not be their concern.
Since causal disclosure is essential dependent on considered that are directed under experiments, ethics is an issue particularly critical in this area. Two investigations including human beings that have disregarded any code of morals and ethics are the Tuskegee Experiment and the Willowbrook Study [7].
 Tuskegee Experiment : For study the long-term consequences of the disease, American researchers deliberately delayed treatment to 399 African-American citizens with syphilis. Even after a cure for penicillin was available (Tuskegee, Alabama), they were deliberately left for suffer with syphilis.  Willowbrook Study (1963)(1964)(1965)(1966): Hepatitis was deliberately induced in children with developmental disabilities. The purpose of the study was to investigate the course of the disease and to test a potential immunization. www.aetic.theiaer.org

RQ2: What are Causal Reasoning Methods and Techniques for Smart Agriculture?
The review of related work of application of causal reasoning concerning smart agriculture is done around four major aspects of smart farming as classified in [12]. These generic categories involve crop management, livestock management, water management, and soil management.

Causal Reasoning in crop management
Crop management is one of the most important tasks in precision agriculture. It represents an important niche in agricultural food production for delivering top quality, demand requested and disease-free crops. Crop management considers very important aspects such as yield prediction which tackles the process of estimating the right amount of crops expected for a particular season (yield estimation), mapping a right crop cultures for proper climatic regions (yield mapping), matching the crop demand with supply (yield matching) and crop management for increasing the yield production.
The fact worth noting is that there exists a myriad of Machine learning algorithms related to crop management in comparison to causal reasoning approaches [13]. Functionalities such as counting coffee fruits on coffee branches using feature extraction of coffee images using Support . ANNs is also seen on [20] where a method for accurate analysis for agricultural yield prediction is introduced using agricultural datasets that comprise historical data from meteorological, environmental, economic, and harvest records. It is worth mentioning that within crop management there are substantial number of articles that concentrate on the aspects that involve subcategories of such as disease detection and yield prediction. Within disease detection we identified papers that deal with detection and discrimination between healthy infected crops with fungus in case of Silybum marianum [21], nitrogen detection of stressed, yellow rust infections in wheat [22][23][24] using and ANNs and Kohonen Self Organizing Maps (SOMs); parasite classification and detection [25][26] in specific crops like strawberries and rice, disease detection water stressed parasite detection in wheat [27] using SVM and some approaches even involve deep learning in detection and diagnosis of generic plants [28] [29]. In yield prediction we have identified papers using SVM and ANNs for estimating yield revenue [30] [31]. In Chingarayan [31] we witness the use of remote sensing techniques in estimating the yield using nitrogen status estimation in soil whilst in Shao [30], a method is introduced for estimating the nitrogen status in soil for rice using the Least Squares Support Vector Machine (LS-SVM) model compared with partial least square (PLS) and back propagation neural network methods.
Some approaches that tends to complement a little bit with causal reasoning focus on design reasoning in agronomy as seen in [32] which aims to holistically improve the overall design reasoning in agriculture. The paper sheds light in crop development and attempts to provide a better insight of how new agronomic approaches emerge. They also analyze the impact and contribution of design reasoning to better understand the various reasoning patterns in agronomy and bring to attention some research paths for future with a view in enriching agronomists' "design toolbox" and co-design in agriculture in the sense how this design reasoning can fit on the overall process. There exist also semantic approaches towards causal reasoning in agriculture as seen in [33] with AgriNET, a semantic knowledge base framework for decision support in smart agriculture. The framework a rule-based inference engine based on SWRL language. All the above approaches fail to address the causal inference in their approaches. www.aetic.theiaer.org

Causal inference in livestock management
Livestock management is an important aspect of animal welfare in smart agriculture. Predicting the livestock yield, diseases and proper decision making is important for ensuring maximum food production.
Even though there are substantial work conducted in livestock management in relation to machine learning, causal models in this category are still lacking. In this direction we can mention classification of cattle behaviour using bagging with tree learners [34], identification and classification of chewing patterns in calves using decision trees [35], animal tracking and behaviour annotation of the pigs to measure behavioural changes in pigs for welfare and health monitoring using Gaussian mixture models (GMMs) [36], prediction of rumen fermentation pattern from milk fatty acids in cattle using ANNs [37], early detection and warning of commercial hen eggs production as well as weight trajectories in cattle using SVM [38][39] [40].
One paper where causal inference is treated is seen in [41] where an attempt on inferring causal relationships from observational data in livestock is presented. The paper stresses the complexity of confounding in such environments where for some specific cases, data mining techniques where used. Other papers related to causal reasoning that treat the decision-making aspects in livestock management are [42][43] [44] to name a few.

Causal inference in water management
Water management in smart agriculture requires meaningful exertion and plays a significant role in hydrological, climatological, and agronomical balance.
Papers that treat data mining an machine learning approaches focus mainly on evapotranspiration through estimation of monthly mean reference of evapotranspiration arid and semi-arid regions using regression techniques [45] [46], estimation of weekly evapotranspiration on data from collected from two meteorological weather stations and prediction of daily dew point temperature using ANNs [47] [48].
Causal reasoning in water management is seen in [49] where an attempt for analyzing river runoff temporal behaviour. The analysis of causal reason is done through Bayesian Networks (BN) which treats only the statistical attribute dependency alone without giving a serious thought to causation with confounding and counterfactuals. A similar approach is seen also in [50] [51] which focuses on planning the information needs for water management systems using conditional probability networks (CPNs), which in its nature are Bayesian Based Belief Networks.

Causal Inference in soil management
The final category of smart agriculture review of papers as categorized in [12] concerns causal inference in soil management. There are contributions toward this category concerning machine learning approaches specifically applications on prediction-identification of agricultural soil properties, such as the estimation of soil drying, condition, temperature, and moisture content. In this context we can separate approaches like: Evaluation of soil drying for agricultural planning using k-nearest neighbour (KNN) and Artificial neural Networks (ANNs) [52], prediction of soil organic carbon (OC), moisture content (MC) and total nitrogen (TN) in around 140 soil samples using SVM and regression techniques [53], estimation of soil temperatures taken from various depths using ANNs [54] and soil moisture estimation [55] to name a few.
In soil management we witness contributions towards causal reasoning in winnowed soils for microbiome identification of plant invasiveness in microbial networks [56] using structural equation modelling (SEM). In [57] a quantification of risk factors in soils using Bayesian Networks is attempted. The inference is solely on dependence of factors that quantify soil rehabilitation in coal regions and their inter-dependence with Bayesian belief Nets. In [58] a causation method is provided for risk assessment of heavy metals in soil. The latter utilizes the correlation from data that can be used for causation. The paper does not indicate how causal reasoning is done from data alone and having in mind that correlation as seen from causal reasoning does not explicitly mean causation [

Literature Review Summary -Challenges and Limitations
The review process is conducted following the five steps identified by [2]. The summarized results of collected and analyzed related work are grouped according to:  Computer sciences digital library where the study is published -IEEE Xplore, Elsevier, Google Scholar, ResearchGate, ACM Digital Library, Springer Open, DBLP.  Reference Type that categorizes the publication as journal article, conference proceeding, book chapter etc.  The year of publication and the country where the research was conducted in order to identify the actuality of the problem definitions  Data repositories used in case studies of causal reasoning application -public, synthetic, public-synthetic and theoretical. The intention is to prove the concept that further research in this direction can be done using data sets available online such as Data.gov, Kaggle, Amazon, Google, UCI etc.  Problem definition in order to identify the domain of causal reasoning application: health, smart farming etc.  Categorization of the publication in accordance with its relevance to research questions which are evaluated and answered in the section above.  The methodology that is employed in the literature review has some limitations:  The study analyzes articles extracted based on specific keywords such as "agriculture", "smart farming", "ethics", etc. Articles without these keywords may have been omitted during the retrieval process.  Findings are based on data collected only from well-known digital libraries, so other materials which may contain more case studies on causal reasoning and causal reasoning in smart farming, might have been excluded.

Conclusion
Machine learning methods, techniques and algorithms have stimulated serious research efforts in discovering causal relationships in different datasets and domains. We survey the most recent and significant studies from relevant journals and conference proceedings on this topic from multidimensional perspectives. Four approaches of causal inference were presented, including water, soil, livestock and crop management. The results indicate that causal reasoning is gaining momentum in research for different domains, including the smart agriculture. Moreover, in all the review done we can conclude that causal reasoning is partially analyzed in this realm of agriculture. However, still a number of challenges need to be addressed. First important issue is the generation, availability and ownership of appropriate datasets for agriculture. Policies and standards for quality data that fosters trust remains to be established. In addition, the efficiency, reliability, cost-effectiveness and usefulness of causal reasoning approaches in smart agriculture still has a long road ahead to its full maturity. Finally, the potential misuse of data and causal reasoning findings creates additional ethical and legal challenge that requires the definition and regulation by normative framework.