Amphibious plants, living in land-water ecotones, have to cope with challenging and continuously changing growth conditions in their habitats with respect to nutrient and light availability. They have thus evolved a variety of mechanisms to tolerate and adapt to these changes. Therefore, the study of these plants is a major area of ecophysiology and environmental ecological research. However, our understanding of their capacity for physiological adaptation and tolerance remains limited and requires systemic approaches for comprehensive analyses. To this end, in this study, we have conducted a mesocosm experiment to analyze the response of Butomus umbellatus, a common amphibious species in Denmark, to nutrient enrichment and shading. Our study follows a systematic integration of morphological (including plant height, leaf number, and biomass accumulation), ecophysiological (photosynthesis-irradiance responses, leaf pigment content, and C and N content in plant organs), and leaf metabolomic measurements using gas chromatography-mass spectrometry (39 mainly primary metabolites), based on bioinformatic methods. No studies of this type have been previously reported for this plant species. We observed that B. umbellatus responds to nutrient enrichment and light reduction through different mechanisms and were able to identify its nutrient enrichment acclimation threshold within the applied nutrient gradient. Up to that threshold, the morpho-physiological response to nutrient enrichment was profound, indicating fast-growing trends (higher growth rates and biomass accumulation), but only few parameters changed significantly from light to shade [specific leaf area (SLA); quantum yield (φ)]. Metabolomic analysis supported the morpho-physiological results regarding nutrient overloading, indicating also subtle changes due to shading not directly apparent in the other measurements. The combined profile analysis revealed leaf metabolite and morpho-physiological parameter associations. In this context, leaf lactate, currently of uncertain role in higher plants, emerged as a shading acclimation biomarker, along with SLA and φ. The study enhances both the ecophysiology methodological toolbox and our knowledge of the adaptive capacity of amphibious species. It demonstrates that the educated combination of physiological with metabolomic measurements using bioinformatic approaches is a promising approach for ecophysiology research, enabling the elucidation of discriminatory metabolic shifts to be used for early diagnosis and even prognosis of natural ecosystem responses to climate change.
Abstract Ribosomal genes produce the constituents of the ribosome, one of the most conserved subcellular structures of all cells, from bacteria to eukaryotes, including animals. There are notions that some protein-coding ribosomal genes vary in their roles across species, particularly vertebrates, through the involvement of some in a number of genetic diseases. Based on extensive sequence comparisons and systematic curation, we establish a reference set for ribosomal proteins (RPs) in eleven vertebrate species and quantify their sequence conservation levels. Moreover, we correlate their coordinated gene expression patterns within up to 33 tissues and assess the exceptional role of paralogs in tissue specificity. Importantly, our analysis supported by the development and use of machine learning models strongly proposes that the variation in the observed tissue-specific gene expression of RPs is rather species-related and not due to tissue-based evolutionary processes. The data obtained suggest that RPs exhibit a complex relationship between their structure and function that broadly maintains a consistent expression landscape across tissues, while most of the variation arises from species idiosyncrasies. The latter may be due to evolutionary change and adaptation, rather than functional constraints at the tissue level throughout the vertebrate lineage.
The genome of SARS-CoV-2, the coronavirus responsible for the Covid-19 pandemic, encodes a number of accessory genes. The longest accessory gene, Orf3a, plays important roles in the virus lifecycle indicated by experimental findings, known polymorphisms, its evolutionary trajectory and a distinct three-dimensional fold. Here we show that supervised, sensitive database searches with Orf3a detect weak, yet significant and highly specific similarities to the M proteins of coronaviruses. The similarity profiles can be used to derive low-resolution three-dimensional models for M proteins based on Orf3a as a structural template. The models also explain the emergence of Orf3a from M proteins and suggest a recent origin across the coronavirus lineage, enunciated by its restricted phylogenetic distribution. This study provides evidence for the common origin of M and Orf3a families and proposes for the first time a working model for the structure of the universally distributed M proteins in coronaviruses, consistent with the properties of both protein families. Graphical abstract
The statistically equivalent signature (SES) algorithm is a method for feature selection inspired by the principles of constraint-based learning of Bayesian networks. Most of the currently available feature selection methods return only a single subset of features, supposedly the one with the highest predictive power. We argue that in several domains multiple subsets can achieve close to maximal predictive accuracy, and that arbitrarily providing only one has several drawbacks. The SES method attempts to identify multiple, predictive feature subsets whose performances are statistically equivalent. In that respect the SES algorithm subsumes and extends previous feature selection algorithms, like the max-min parent children algorithm. The SES algorithm is implemented in an homonym function included in the R package MXM, standing for mens ex machina, meaning 'mind from the machine' in Latin. The MXM implementation of SES handles several data analysis tasks, namely classification, regression and survival analysis. In this paper we present the SES algorithm, its implementation, and provide examples of use of the SES function in R. Furthermore, we analyze three publicly available data sets to illustrate the equivalence of the signatures retrieved by SES and to contrast SES against the state-of-the-art feature selection method LASSO. Our results provide initial evidence that the two methods perform comparably well in terms of predictive accuracy and that multiple, equally predictive signatures are actually present in real world data.
Abstract Motivation In the context of genome-wide association studies (GWAS), there is a variety of statistical techniques in order to conduct the analysis, but, in most cases, the underlying genetic model is usually unknown. Under these circumstances, the classical Cochran-Armitage trend test (CATT) is suboptimal. Robust procedures that maximize the power and preserve the nominal type I error rate are preferable. Moreover, performing a meta-analysis using robust procedures is of great interest and has never been addressed in the past. The primary goal of this work is to implement several robust methods for analysis and meta-analysis in the statistical package Stata and subsequently to make the software available to the scientific community. Results The CATT under a recessive, additive and dominant model of inheritance as well as robust methods based on the Maximum Efficiency Robust Test statistic, the MAX statistic and the MIN2 were implemented in Stata. Concerning MAX and MIN2, we calculated their asymptotic null distributions relying on numerical integration resulting in a great gain in computational time without losing accuracy. All the aforementioned approaches were employed in a fixed or a random effects meta-analysis setting using summary data with weights equal to the reciprocal of the combined cases and controls. Overall, this is the first complete effort to implement procedures for analysis and meta-analysis in GWAS using Stata. Availability and Implementation A Stata program and a web-server are freely available for academic users at http://www.compgen.org/tools/GWAR Supplementary information Supplementary data are available at Bioinformatics online.
Abstract DIANA-LncBase v3.0 (www.microrna.gr/LncBase) is a reference repository with experimentally supported miRNA targets on non-coding transcripts. Its third version provides approximately half a million entries, corresponding to ∼240 000 unique tissue and cell type specific miRNA–lncRNA pairs. This compilation of interactions is derived from the manual curation of publications and the analysis of >300 high-throughput datasets. miRNA targets are supported by 14 experimental methodologies, applied to 243 distinct cell types and tissues in human and mouse. The largest part of the database is highly confident, AGO-CLIP-derived miRNA-binding events. LncBase v3.0 is the first relevant database to employ a robust CLIP-Seq-guided algorithm, microCLIP framework, to analyze 236 AGO-CLIP-Seq libraries and catalogue ∼370 000 miRNA binding events. The database was redesigned from the ground up, providing new functionalities. Known short variant information, on >67,000 experimentally supported target sites and lncRNA expression profiles in different cellular compartments are catered to users. Interactive visualization plots, portraying correlations of miRNA–lncRNA pairs, as well as lncRNA expression profiles in a wide range of cell types and tissues, are presented for the first time through a dedicated page. LncBase v3.0 constitutes a valuable asset for ncRNA research, providing new insights to the understanding of the still widely unexplored lncRNA functions.
Abstract Summary: Identifying, amongst millions of publications available in MEDLINE, those that are relevant to specific microRNAs (miRNAs) of interest based on keyword search faces major obstacles. References to miRNA names in the literature often deviate from standard nomenclature for various reasons, since even the official nomenclature evolves. For instance, a single miRNA name may identify two completely different molecules or two different names may refer to the same molecule. mirPub is a database with a powerful and intuitive interface, which facilitates searching for miRNA literature, addressing the aforementioned issues. To provide effective search services, mirPub applies text mining techniques on MEDLINE, integrates data from several curated databases and exploits data from its user community following a crowdsourcing approach. Other key features include an interactive visualization service that illustrates intuitively the evolution of miRNA data, tag clouds summarizing the relevance of publications to particular diseases, cell types or tissues and access to TarBase 6.0 data to oversee genes related to miRNA publications. Availability and Implementation: mirPub is freely available at http://www.microrna.gr/mirpub/. Contact: email@example.com or firstname.lastname@example.org Supplementary information: Supplementary data are available at Bioinformatics online.
AbstractIdiopathic pulmonary fibrosis is a lethal lung fibroproliferative disease with limited therapeutic options. Differential expression profiling of affected sites has been instrumental for involved pathogenetic mechanisms dissection and therapeutic targets discovery. However, there have been limited efforts to comparatively analyse/mine the numerous related publicly available datasets, to fully exploit their potential on the validation/creation of novel research hypotheses. In this context and towards that goal, we present Fibromine, an integrated database and exploration environment comprising of consistently re-analysed, manually curated transcriptomic and proteomic pulmonary fibrosis datasets covering a wide range of experimental designs in both patients and animal models. Fibromine can be accessed via an R Shiny application (http://www.fibromine.com/Fibromine) which offers dynamic data exploration and real-time integration functionalities. Moreover, we introduce a novel benchmarking system based on transcriptomic datasets underlying characteristics, resulting to dataset accreditation aiming to aid the user on dataset selection. Cell specificity of gene expression can be visualised and/or explored in several scRNA-seq datasets, in an effort to link legacy data with this cutting-edge methodology and paving the way to their integration. Several use case examples are presented, that, importantly, can be reproduced on-the-fly by a non-specialist user, the primary target and potential user of this endeavour.
During recent years, X-ray microtomography (micro-CT) has seen an increasing use in biological research areas, such as functional morphology, taxonomy, evolutionary biology and developmental research. Micro-CT is a technology which uses X-rays to create sub-micron resolution images of external and internal features of specimens. These images can then be rendered in a three-dimensional space and used for qualitative and quantitative 3D analyses. However, the online exploration and dissemination of micro-CT datasets are rarely made available to the public due to their large size and a lack of dedicated online platforms for the interactive manipulation of 3D data. Here, the development of a virtual micro-CT laboratory (Micro-CTvlab) is described, which can be used by everyone who is interested in digitisation methods and biological collections and aims at making the micro-CT data exploration of natural history specimens freely available over the internet. The Micro-CTvlab offers to the user virtual image galleries of various taxa which can be displayed and downloaded through a web application. With a few clicks, accurate, detailed and three-dimensional models of species can be studied and virtually dissected without destroying the actual specimen. The data and functions of the Micro-CTvlab can be accessed either on a normal computer or through a dedicated version for mobile devices.
We describe here OMPdb, which is currently the most complete and comprehensive collection of integral β-barrel outer membrane proteins from Gram-negative bacteria. The database currently contains 69 354 proteins, which are classified into 85 families, based mainly on structural and functional criteria. Although OMPdb follows the annotation scheme of Pfam, many of the families included in the database were not previously described or annotated in other publicly available databases. There are also cross-references to other databases, references to the literature and annotation for sequence features, like transmembrane segments and signal peptides. Furthermore, via the web interface, the user can not only browse the available data, but submit advanced text searches and run BLAST queries against the database protein sequences or domain searches against the collection of profile Hidden Markov Models that represent each family’s domain organization as well. The database is freely accessible for academic users at http://bioinformatics.biol.uoa.gr/OMPdb and we expect it to be useful for genome-wide analyses, comparative genomics as well as for providing training and test sets for predictive algorithms regarding transmembrane β-barrels.
Abstract Two-layered hollow-fiber membrane bioreactors containing immobilized enzymes are investigated in this work at mesoscopic and macroscopic scales. Layer-by-layer reconstruction of the membrane in three dimensions is achieved using two different stochastic techniques that draw data only from electron microscopy images of the structure. The reconstructed layers are subsequently used for the prediction of effective transport properties through the numerical solution of the relevant transport equations. The methodology is used for the investigation of hydrolysis of lactose, employing a convective diffusion and reaction macroscopic model. Direct comparison of model predictions with experimental data from the literature reveals a good agreement, provided that one uses the effective transport coefficients that are calculated from the reconstruction process. A parametric investigation of the hydrolysis process and its efficiency was conducted. The three-dimensional reconstructions are also used for the modeling of the hydrolysis of lactose at the pore scale, taking into account mass transport limitations and applying the same hydrolysis kinetics as in the macroscopic model. The agreement was quite satisfactory. Among the advantages of the pore scale approach that is formulated here is its ability to correlate pore structure details with the performance of the lactose hydrolysis process at the inevitable expense of computational resources.
Data sharing, integration and annotation are essential to ensure the reproducibility of the analysis and interpretation of the experimental findings. Often these activities are perceived as a role that bioinformaticians and computer scientists have to take with no or little input from the experimental biologist. On the contrary, biological researchers, being the producers and often the end users of such data, have a big role in enabling biological data integration. The quality and usefulness of data integration depend on the existence and adoption of standards, shared formats, and mechanisms that are suitable for biological researchers to submit and annotate the data, so it can be easily searchable, conveniently linked and consequently used for further biological analysis and discovery. Here, we provide background on what is data integration from a computational science point of view, how it has been applied to biological research, which key aspects contributed to its success and future directions.