search
Include:
The following results are related to ELIXIR GR. Are you interested to view more results? Visit OpenAIRE - Explore.
101 Research products, page 1 of 11

  • ELIXIR GR
  • Open Access

10
arrow_drop_down
Relevance
arrow_drop_down
  • Open Access
    Authors: 
    Ouzounis, Christos A.;
    Publisher: Elsevier BV

    The genome of SARS-CoV-2, the coronavirus responsible for the Covid-19 pandemic, encodes a number of accessory genes. The longest accessory gene, Orf3a, plays important roles in the virus lifecycle indicated by experimental findings, known polymorphisms, its evolutionary trajectory and a distinct three-dimensional fold. Here we show that supervised, sensitive database searches with Orf3a detect weak, yet significant and highly specific similarities to the M proteins of coronaviruses. The similarity profiles can be used to derive low-resolution three-dimensional models for M proteins based on Orf3a as a structural template. The models also explain the emergence of Orf3a from M proteins and suggest a recent origin across the coronavirus lineage, enunciated by its restricted phylogenetic distribution. This study provides evidence for the common origin of M and Orf3a families and proposes for the first time a working model for the structure of the universally distributed M proteins in coronaviruses, consistent with the properties of both protein families. Graphical abstract

  • Open Access English
    Authors: 
    Paraskevi Manolaki; Paraskevi Manolaki; Georgia Tooulakou; Caroline Urup Byberg; Franziska Eller; Brian K. Sorrell; Maria I. Klapa; Tenna Riis;
    Publisher: Frontiers Media S.A.

    Amphibious plants, living in land-water ecotones, have to cope with challenging and continuously changing growth conditions in their habitats with respect to nutrient and light availability. They have thus evolved a variety of mechanisms to tolerate and adapt to these changes. Therefore, the study of these plants is a major area of ecophysiology and environmental ecological research. However, our understanding of their capacity for physiological adaptation and tolerance remains limited and requires systemic approaches for comprehensive analyses. To this end, in this study, we have conducted a mesocosm experiment to analyze the response of Butomus umbellatus, a common amphibious species in Denmark, to nutrient enrichment and shading. Our study follows a systematic integration of morphological (including plant height, leaf number, and biomass accumulation), ecophysiological (photosynthesis-irradiance responses, leaf pigment content, and C and N content in plant organs), and leaf metabolomic measurements using gas chromatography-mass spectrometry (39 mainly primary metabolites), based on bioinformatic methods. No studies of this type have been previously reported for this plant species. We observed that B. umbellatus responds to nutrient enrichment and light reduction through different mechanisms and were able to identify its nutrient enrichment acclimation threshold within the applied nutrient gradient. Up to that threshold, the morpho-physiological response to nutrient enrichment was profound, indicating fast-growing trends (higher growth rates and biomass accumulation), but only few parameters changed significantly from light to shade [specific leaf area (SLA); quantum yield (φ)]. Metabolomic analysis supported the morpho-physiological results regarding nutrient overloading, indicating also subtle changes due to shading not directly apparent in the other measurements. The combined profile analysis revealed leaf metabolite and morpho-physiological parameter associations. In this context, leaf lactate, currently of uncertain role in higher plants, emerged as a shading acclimation biomarker, along with SLA and φ. The study enhances both the ecophysiology methodological toolbox and our knowledge of the adaptive capacity of amphibious species. It demonstrates that the educated combination of physiological with metabolomic measurements using bioinformatic approaches is a promising approach for ecophysiology research, enabling the elucidation of discriminatory metabolic shifts to be used for early diagnosis and even prognosis of natural ecosystem responses to climate change.

  • Open Access English
    Authors: 
    Thanasis Vergoulis; Ilias Kanellos; Nikos Kostoulas; Georgios Georgakilas; Timos Sellis; Artemis G. Hatzigeorgiou; Theodore Dalamagas;
    Publisher: Oxford University Press

    Summary: Identifying, amongst millions of publications available in MEDLINE, those that are relevant to specific microRNAs (miRNAs) of interest based on keyword search faces major obstacles. References to miRNA names in the literature often deviate from standard nomenclature for various reasons, since even the official nomenclature evolves. For instance, a single miRNA name may identify two completely different molecules or two different names may refer to the same molecule. mirPub is a database with a powerful and intuitive interface, which facilitates searching for miRNA literature, addressing the aforementioned issues. To provide effective search services, mirPub applies text mining techniques on MEDLINE, integrates data from several curated databases and exploits data from its user community following a crowdsourcing approach. Other key features include an interactive visualization service that illustrates intuitively the evolution of miRNA data, tag clouds summarizing the relevance of publications to particular diseases, cell types or tissues and access to TarBase 6.0 data to oversee genes related to miRNA publications. Availability and Implementation: mirPub is freely available at http://www.microrna.gr/mirpub/. Contact: rg.noitavonni-anehta.simi@siluogrev or rg.noitavonni-anehta.simi@gamalad Supplementary information: Supplementary data are available at Bioinformatics online.

  • Open Access
    Authors: 
    Niki Dimou; Konstantinos D. Tsirigos; Arne Elofsson; Pantelis G. Bagos;
    Publisher: Oxford University Press (OUP)

    Motivation In the context of genome-wide association studies (GWAS), there is a variety of statistical techniques in order to conduct the analysis, but, in most cases, the underlying genetic model is usually unknown. Under these circumstances, the classical Cochran-Armitage trend test (CATT) is suboptimal. Robust procedures that maximize the power and preserve the nominal type I error rate are preferable. Moreover, performing a meta-analysis using robust procedures is of great interest and has never been addressed in the past. The primary goal of this work is to implement several robust methods for analysis and meta-analysis in the statistical package Stata and subsequently to make the software available to the scientific community. Results The CATT under a recessive, additive and dominant model of inheritance as well as robust methods based on the Maximum Efficiency Robust Test statistic, the MAX statistic and the MIN2 were implemented in Stata. Concerning MAX and MIN2, we calculated their asymptotic null distributions relying on numerical integration resulting in a great gain in computational time without losing accuracy. All the aforementioned approaches were employed in a fixed or a random effects meta-analysis setting using summary data with weights equal to the reciprocal of the combined cases and controls. Overall, this is the first complete effort to implement procedures for analysis and meta-analysis in GWAS using Stata. Availability and implementation A Stata program and a web-server are freely available for academic users at http://www.compgen.org/tools/GWAR. Contact pbagos@compgen.org. Supplementary information Supplementary data are available at Bioinformatics online.

  • Open Access
    Authors: 
    Lagani, Vincenzo; Athineou, Giorgos; Farcomeni, Alessio; Tsagris, Michail; Tsamardinos, Ioannis;
    Publisher: (:unav)
    Country: Italy
    Project: EC | STATEGRA (306000), EC | CAUSALPATH (617393)

    The statistically equivalent signature (SES) algorithm is a method for feature selection inspired by the principles of constraint-based learning of Bayesian networks. Most of the currently available feature selection methods return only a single subset of features, supposedly the one with the highest predictive power. We argue that in several domains multiple subsets can achieve close to maximal predictive accuracy, and that arbitrarily providing only one has several drawbacks. The SES method attempts to identify multiple, predictive feature subsets whose performances are statistically equivalent. In that respect the SES algorithm subsumes and extends previous feature selection algorithms, like the max-min parent children algorithm. The SES algorithm is implemented in an homonym function included in the R package MXM, standing for mens ex machina, meaning 'mind from the machine' in Latin. The MXM implementation of SES handles several data analysis tasks, namely classification, regression and survival analysis. In this paper we present the SES algorithm, its implementation, and provide examples of use of the SES function in R. Furthermore, we analyze three publicly available data sets to illustrate the equivalence of the signatures retrieved by SES and to contrast SES against the state-of-the-art feature selection method LASSO. Our results provide initial evidence that the two methods perform comparably well in terms of predictive accuracy and that multiple, equally predictive signatures are actually present in real world data.

  • Open Access
    Authors: 
    Dimitra Karagkouni; Maria D. Paraskevopoulou; Spyros Tastsoglou; Giorgos Skoufos; Anna Karavangeli; Vasileios Pierros; Elissavet Zacharopoulou; Artemis G. Hatzigeorgiou;
    Publisher: Oxford University Press (OUP)

    Abstract DIANA-LncBase v3.0 (www.microrna.gr/LncBase) is a reference repository with experimentally supported miRNA targets on non-coding transcripts. Its third version provides approximately half a million entries, corresponding to ∼240 000 unique tissue and cell type specific miRNA–lncRNA pairs. This compilation of interactions is derived from the manual curation of publications and the analysis of >300 high-throughput datasets. miRNA targets are supported by 14 experimental methodologies, applied to 243 distinct cell types and tissues in human and mouse. The largest part of the database is highly confident, AGO-CLIP-derived miRNA-binding events. LncBase v3.0 is the first relevant database to employ a robust CLIP-Seq-guided algorithm, microCLIP framework, to analyze 236 AGO-CLIP-Seq libraries and catalogue ∼370 000 miRNA binding events. The database was redesigned from the ground up, providing new functionalities. Known short variant information, on >67,000 experimentally supported target sites and lncRNA expression profiles in different cellular compartments are catered to users. Interactive visualization plots, portraying correlations of miRNA–lncRNA pairs, as well as lncRNA expression profiles in a wide range of cell types and tissues, are presented for the first time through a dedicated page. LncBase v3.0 constitutes a valuable asset for ncRNA research, providing new insights to the understanding of the still widely unexplored lncRNA functions.

  • Open Access
    Authors: 
    Konstantinos Kyritsis; Christos A. Ouzounis; Lefteris Angelis; Ioannis S. Vizirianakis;
    Publisher: Oxford University Press (OUP)

    Abstract Ribosomal genes produce the constituents of the ribosome, one of the most conserved subcellular structures of all cells, from bacteria to eukaryotes, including animals. There are notions that some protein-coding ribosomal genes vary in their roles across species, particularly vertebrates, through the involvement of some in a number of genetic diseases. Based on extensive sequence comparisons and systematic curation, we establish a reference set for ribosomal proteins (RPs) in eleven vertebrate species and quantify their sequence conservation levels. Moreover, we correlate their coordinated gene expression patterns within up to 33 tissues and assess the exceptional role of paralogs in tissue specificity. Importantly, our analysis supported by the development and use of machine learning models strongly proposes that the variation in the observed tissue-specific gene expression of RPs is rather species-related and not due to tissue-based evolutionary processes. The data obtained suggest that RPs exhibit a complex relationship between their structure and function that broadly maintains a consistent expression landscape across tissues, while most of the variation arises from species idiosyncrasies. The latter may be due to evolutionary change and adaptation, rather than functional constraints at the tissue level throughout the vertebrate lineage.

  • Open Access English
    Authors: 
    Mier, Pablo; Paladin, Lisanna; Tamana, Stella; Petrosian, Sophia; Hajdu-Soltész, Borbála; Urbanek, Annika; Gruca, Aleksandra; Plewczynski, Dariusz; Grynberg, Marcin; Bernadó, Pau; +26 more
    Publisher: HAL CCSD
    Countries: Italy, France, Cyprus
    Project: EC | chemREPEAT (648030), EC | IDPfun (778247)

    Abstract There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs. Short abstract There are multiple definitions for low complexity regions (LCRs) in protein sequences. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, plus overlaps between different properties related to LCRs, using examples.

  • Open Access
    Authors: 
    Dionysios Fanidis; Panagiotis Moulos; Vassilis Aidinis;
    Publisher: Springer Science and Business Media LLC

    AbstractIdiopathic pulmonary fibrosis is a lethal lung fibroproliferative disease with limited therapeutic options. Differential expression profiling of affected sites has been instrumental for involved pathogenetic mechanisms dissection and therapeutic targets discovery. However, there have been limited efforts to comparatively analyse/mine the numerous related publicly available datasets, to fully exploit their potential on the validation/creation of novel research hypotheses. In this context and towards that goal, we present Fibromine, an integrated database and exploration environment comprising of consistently re-analysed, manually curated transcriptomic and proteomic pulmonary fibrosis datasets covering a wide range of experimental designs in both patients and animal models. Fibromine can be accessed via an R Shiny application (http://www.fibromine.com/Fibromine) which offers dynamic data exploration and real-time integration functionalities. Moreover, we introduce a novel benchmarking system based on transcriptomic datasets underlying characteristics, resulting to dataset accreditation aiming to aid the user on dataset selection. Cell specificity of gene expression can be visualised and/or explored in several scRNA-seq datasets, in an effort to link legacy data with this cutting-edge methodology and paving the way to their integration. Several use case examples are presented, that, importantly, can be reproduced on-the-fly by a non-specialist user, the primary target and potential user of this endeavour.

  • Open Access
    Authors: 
    Kleoniki Keklikoglou; Sarah Faulwetter; Eva Chatzinikolaou; Nikitas Michalakis; Irene Filiopoulou; Nikos Minadakis; Emmanouela Panteri; George Perantinos; Alexandros Gougousis; Christos Arvanitidis;
    Publisher: Zenodo
    Project: EC | SYNTHESYS3 (312253), EC | MARBIGEN (264089)

    During recent years, X-ray microtomography (micro-CT) has seen an increasing use in biological research areas, such as functional morphology, taxonomy, evolutionary biology and developmental research. Micro-CT is a technology which uses X-rays to create sub-micron resolution images of external and internal features of specimens. These images can then be rendered in a three-dimensional space and used for qualitative and quantitative 3D analyses. However, the online exploration and dissemination of micro-CT datasets are rarely made available to the public due to their large size and a lack of dedicated online platforms for the interactive manipulation of 3D data. Here, the development of a virtual micro-CT laboratory (Micro-CTvlab) is described, which can be used by everyone who is interested in digitisation methods and biological collections and aims at making the micro-CT data exploration of natural history specimens freely available over the internet. The Micro-CTvlab offers to the user virtual image galleries of various taxa which can be displayed and downloaded through a web application. With a few clicks, accurate, detailed and three-dimensional models of species can be studied and virtually dissected without destroying the actual specimen. The data and functions of the Micro-CTvlab can be accessed either on a normal computer or through a dedicated version for mobile devices.

search
Include:
The following results are related to ELIXIR GR. Are you interested to view more results? Visit OpenAIRE - Explore.
101 Research products, page 1 of 11
  • Open Access
    Authors: 
    Ouzounis, Christos A.;
    Publisher: Elsevier BV

    The genome of SARS-CoV-2, the coronavirus responsible for the Covid-19 pandemic, encodes a number of accessory genes. The longest accessory gene, Orf3a, plays important roles in the virus lifecycle indicated by experimental findings, known polymorphisms, its evolutionary trajectory and a distinct three-dimensional fold. Here we show that supervised, sensitive database searches with Orf3a detect weak, yet significant and highly specific similarities to the M proteins of coronaviruses. The similarity profiles can be used to derive low-resolution three-dimensional models for M proteins based on Orf3a as a structural template. The models also explain the emergence of Orf3a from M proteins and suggest a recent origin across the coronavirus lineage, enunciated by its restricted phylogenetic distribution. This study provides evidence for the common origin of M and Orf3a families and proposes for the first time a working model for the structure of the universally distributed M proteins in coronaviruses, consistent with the properties of both protein families. Graphical abstract

  • Open Access English
    Authors: 
    Paraskevi Manolaki; Paraskevi Manolaki; Georgia Tooulakou; Caroline Urup Byberg; Franziska Eller; Brian K. Sorrell; Maria I. Klapa; Tenna Riis;
    Publisher: Frontiers Media S.A.

    Amphibious plants, living in land-water ecotones, have to cope with challenging and continuously changing growth conditions in their habitats with respect to nutrient and light availability. They have thus evolved a variety of mechanisms to tolerate and adapt to these changes. Therefore, the study of these plants is a major area of ecophysiology and environmental ecological research. However, our understanding of their capacity for physiological adaptation and tolerance remains limited and requires systemic approaches for comprehensive analyses. To this end, in this study, we have conducted a mesocosm experiment to analyze the response of Butomus umbellatus, a common amphibious species in Denmark, to nutrient enrichment and shading. Our study follows a systematic integration of morphological (including plant height, leaf number, and biomass accumulation), ecophysiological (photosynthesis-irradiance responses, leaf pigment content, and C and N content in plant organs), and leaf metabolomic measurements using gas chromatography-mass spectrometry (39 mainly primary metabolites), based on bioinformatic methods. No studies of this type have been previously reported for this plant species. We observed that B. umbellatus responds to nutrient enrichment and light reduction through different mechanisms and were able to identify its nutrient enrichment acclimation threshold within the applied nutrient gradient. Up to that threshold, the morpho-physiological response to nutrient enrichment was profound, indicating fast-growing trends (higher growth rates and biomass accumulation), but only few parameters changed significantly from light to shade [specific leaf area (SLA); quantum yield (φ)]. Metabolomic analysis supported the morpho-physiological results regarding nutrient overloading, indicating also subtle changes due to shading not directly apparent in the other measurements. The combined profile analysis revealed leaf metabolite and morpho-physiological parameter associations. In this context, leaf lactate, currently of uncertain role in higher plants, emerged as a shading acclimation biomarker, along with SLA and φ. The study enhances both the ecophysiology methodological toolbox and our knowledge of the adaptive capacity of amphibious species. It demonstrates that the educated combination of physiological with metabolomic measurements using bioinformatic approaches is a promising approach for ecophysiology research, enabling the elucidation of discriminatory metabolic shifts to be used for early diagnosis and even prognosis of natural ecosystem responses to climate change.

  • Open Access English
    Authors: 
    Thanasis Vergoulis; Ilias Kanellos; Nikos Kostoulas; Georgios Georgakilas; Timos Sellis; Artemis G. Hatzigeorgiou; Theodore Dalamagas;
    Publisher: Oxford University Press

    Summary: Identifying, amongst millions of publications available in MEDLINE, those that are relevant to specific microRNAs (miRNAs) of interest based on keyword search faces major obstacles. References to miRNA names in the literature often deviate from standard nomenclature for various reasons, since even the official nomenclature evolves. For instance, a single miRNA name may identify two completely different molecules or two different names may refer to the same molecule. mirPub is a database with a powerful and intuitive interface, which facilitates searching for miRNA literature, addressing the aforementioned issues. To provide effective search services, mirPub applies text mining techniques on MEDLINE, integrates data from several curated databases and exploits data from its user community following a crowdsourcing approach. Other key features include an interactive visualization service that illustrates intuitively the evolution of miRNA data, tag clouds summarizing the relevance of publications to particular diseases, cell types or tissues and access to TarBase 6.0 data to oversee genes related to miRNA publications. Availability and Implementation: mirPub is freely available at http://www.microrna.gr/mirpub/. Contact: rg.noitavonni-anehta.simi@siluogrev or rg.noitavonni-anehta.simi@gamalad Supplementary information: Supplementary data are available at Bioinformatics online.

  • Open Access
    Authors: 
    Niki Dimou; Konstantinos D. Tsirigos; Arne Elofsson; Pantelis G. Bagos;
    Publisher: Oxford University Press (OUP)

    Motivation In the context of genome-wide association studies (GWAS), there is a variety of statistical techniques in order to conduct the analysis, but, in most cases, the underlying genetic model is usually unknown. Under these circumstances, the classical Cochran-Armitage trend test (CATT) is suboptimal. Robust procedures that maximize the power and preserve the nominal type I error rate are preferable. Moreover, performing a meta-analysis using robust procedures is of great interest and has never been addressed in the past. The primary goal of this work is to implement several robust methods for analysis and meta-analysis in the statistical package Stata and subsequently to make the software available to the scientific community. Results The CATT under a recessive, additive and dominant model of inheritance as well as robust methods based on the Maximum Efficiency Robust Test statistic, the MAX statistic and the MIN2 were implemented in Stata. Concerning MAX and MIN2, we calculated their asymptotic null distributions relying on numerical integration resulting in a great gain in computational time without losing accuracy. All the aforementioned approaches were employed in a fixed or a random effects meta-analysis setting using summary data with weights equal to the reciprocal of the combined cases and controls. Overall, this is the first complete effort to implement procedures for analysis and meta-analysis in GWAS using Stata. Availability and implementation A Stata program and a web-server are freely available for academic users at http://www.compgen.org/tools/GWAR. Contact pbagos@compgen.org. Supplementary information Supplementary data are available at Bioinformatics online.

  • Open Access
    Authors: 
    Lagani, Vincenzo; Athineou, Giorgos; Farcomeni, Alessio; Tsagris, Michail; Tsamardinos, Ioannis;
    Publisher: (:unav)
    Country: Italy
    Project: EC | STATEGRA (306000), EC | CAUSALPATH (617393)

    The statistically equivalent signature (SES) algorithm is a method for feature selection inspired by the principles of constraint-based learning of Bayesian networks. Most of the currently available feature selection methods return only a single subset of features, supposedly the one with the highest predictive power. We argue that in several domains multiple subsets can achieve close to maximal predictive accuracy, and that arbitrarily providing only one has several drawbacks. The SES method attempts to identify multiple, predictive feature subsets whose performances are statistically equivalent. In that respect the SES algorithm subsumes and extends previous feature selection algorithms, like the max-min parent children algorithm. The SES algorithm is implemented in an homonym function included in the R package MXM, standing for mens ex machina, meaning 'mind from the machine' in Latin. The MXM implementation of SES handles several data analysis tasks, namely classification, regression and survival analysis. In this paper we present the SES algorithm, its implementation, and provide examples of use of the SES function in R. Furthermore, we analyze three publicly available data sets to illustrate the equivalence of the signatures retrieved by SES and to contrast SES against the state-of-the-art feature selection method LASSO. Our results provide initial evidence that the two methods perform comparably well in terms of predictive accuracy and that multiple, equally predictive signatures are actually present in real world data.

  • Open Access
    Authors: 
    Dimitra Karagkouni; Maria D. Paraskevopoulou; Spyros Tastsoglou; Giorgos Skoufos; Anna Karavangeli; Vasileios Pierros; Elissavet Zacharopoulou; Artemis G. Hatzigeorgiou;
    Publisher: Oxford University Press (OUP)

    Abstract DIANA-LncBase v3.0 (www.microrna.gr/LncBase) is a reference repository with experimentally supported miRNA targets on non-coding transcripts. Its third version provides approximately half a million entries, corresponding to ∼240 000 unique tissue and cell type specific miRNA–lncRNA pairs. This compilation of interactions is derived from the manual curation of publications and the analysis of >300 high-throughput datasets. miRNA targets are supported by 14 experimental methodologies, applied to 243 distinct cell types and tissues in human and mouse. The largest part of the database is highly confident, AGO-CLIP-derived miRNA-binding events. LncBase v3.0 is the first relevant database to employ a robust CLIP-Seq-guided algorithm, microCLIP framework, to analyze 236 AGO-CLIP-Seq libraries and catalogue ∼370 000 miRNA binding events. The database was redesigned from the ground up, providing new functionalities. Known short variant information, on >67,000 experimentally supported target sites and lncRNA expression profiles in different cellular compartments are catered to users. Interactive visualization plots, portraying correlations of miRNA–lncRNA pairs, as well as lncRNA expression profiles in a wide range of cell types and tissues, are presented for the first time through a dedicated page. LncBase v3.0 constitutes a valuable asset for ncRNA research, providing new insights to the understanding of the still widely unexplored lncRNA functions.

  • Open Access
    Authors: 
    Konstantinos Kyritsis; Christos A. Ouzounis; Lefteris Angelis; Ioannis S. Vizirianakis;
    Publisher: Oxford University Press (OUP)

    Abstract Ribosomal genes produce the constituents of the ribosome, one of the most conserved subcellular structures of all cells, from bacteria to eukaryotes, including animals. There are notions that some protein-coding ribosomal genes vary in their roles across species, particularly vertebrates, through the involvement of some in a number of genetic diseases. Based on extensive sequence comparisons and systematic curation, we establish a reference set for ribosomal proteins (RPs) in eleven vertebrate species and quantify their sequence conservation levels. Moreover, we correlate their coordinated gene expression patterns within up to 33 tissues and assess the exceptional role of paralogs in tissue specificity. Importantly, our analysis supported by the development and use of machine learning models strongly proposes that the variation in the observed tissue-specific gene expression of RPs is rather species-related and not due to tissue-based evolutionary processes. The data obtained suggest that RPs exhibit a complex relationship between their structure and function that broadly maintains a consistent expression landscape across tissues, while most of the variation arises from species idiosyncrasies. The latter may be due to evolutionary change and adaptation, rather than functional constraints at the tissue level throughout the vertebrate lineage.

  • Open Access English
    Authors: 
    Mier, Pablo; Paladin, Lisanna; Tamana, Stella; Petrosian, Sophia; Hajdu-Soltész, Borbála; Urbanek, Annika; Gruca, Aleksandra; Plewczynski, Dariusz; Grynberg, Marcin; Bernadó, Pau; +26 more
    Publisher: HAL CCSD
    Countries: Italy, France, Cyprus
    Project: EC | chemREPEAT (648030), EC | IDPfun (778247)

    Abstract There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs. Short abstract There are multiple definitions for low complexity regions (LCRs) in protein sequences. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, plus overlaps between different properties related to LCRs, using examples.

  • Open Access
    Authors: 
    Dionysios Fanidis; Panagiotis Moulos; Vassilis Aidinis;
    Publisher: Springer Science and Business Media LLC

    AbstractIdiopathic pulmonary fibrosis is a lethal lung fibroproliferative disease with limited therapeutic options. Differential expression profiling of affected sites has been instrumental for involved pathogenetic mechanisms dissection and therapeutic targets discovery. However, there have been limited efforts to comparatively analyse/mine the numerous related publicly available datasets, to fully exploit their potential on the validation/creation of novel research hypotheses. In this context and towards that goal, we present Fibromine, an integrated database and exploration environment comprising of consistently re-analysed, manually curated transcriptomic and proteomic pulmonary fibrosis datasets covering a wide range of experimental designs in both patients and animal models. Fibromine can be accessed via an R Shiny application (http://www.fibromine.com/Fibromine) which offers dynamic data exploration and real-time integration functionalities. Moreover, we introduce a novel benchmarking system based on transcriptomic datasets underlying characteristics, resulting to dataset accreditation aiming to aid the user on dataset selection. Cell specificity of gene expression can be visualised and/or explored in several scRNA-seq datasets, in an effort to link legacy data with this cutting-edge methodology and paving the way to their integration. Several use case examples are presented, that, importantly, can be reproduced on-the-fly by a non-specialist user, the primary target and potential user of this endeavour.

  • Open Access
    Authors: 
    Kleoniki Keklikoglou; Sarah Faulwetter; Eva Chatzinikolaou; Nikitas Michalakis; Irene Filiopoulou; Nikos Minadakis; Emmanouela Panteri; George Perantinos; Alexandros Gougousis; Christos Arvanitidis;
    Publisher: Zenodo
    Project: EC | SYNTHESYS3 (312253), EC | MARBIGEN (264089)

    During recent years, X-ray microtomography (micro-CT) has seen an increasing use in biological research areas, such as functional morphology, taxonomy, evolutionary biology and developmental research. Micro-CT is a technology which uses X-rays to create sub-micron resolution images of external and internal features of specimens. These images can then be rendered in a three-dimensional space and used for qualitative and quantitative 3D analyses. However, the online exploration and dissemination of micro-CT datasets are rarely made available to the public due to their large size and a lack of dedicated online platforms for the interactive manipulation of 3D data. Here, the development of a virtual micro-CT laboratory (Micro-CTvlab) is described, which can be used by everyone who is interested in digitisation methods and biological collections and aims at making the micro-CT data exploration of natural history specimens freely available over the internet. The Micro-CTvlab offers to the user virtual image galleries of various taxa which can be displayed and downloaded through a web application. With a few clicks, accurate, detailed and three-dimensional models of species can be studied and virtually dissected without destroying the actual specimen. The data and functions of the Micro-CTvlab can be accessed either on a normal computer or through a dedicated version for mobile devices.