Release Information

Latest version: 3.0 (Last modified at 2023-08-15)

Release 3.0

Last modified at 2023-08-15

      Data collection

  • Fifteen new single-cell or single-nucleus RNA data from recently published articles were collected for Release 3.0. Notably, the latest version adds six new species of single-cell or single-nucleus RNA data, including Brassica rapa (PRJCA009630, PRJCA013085), Manihot esculenta (PRJNA895163), Medicago truncatula (PRJCA012129, PRJNA868047), Nepeta tenuifolia (PRJNA743551), and Gossypium hirsutum (PRJNA600131). 29,269 new marker genes were identified from these data.
  • Four spatial transcriptomics data were added to Release 3.0, from four species: Arabidopsis thaliana (CNP0002618), Oryza sativa (our unpublished ST data), Glycine max (PRJCA009893), and Phalaenopsis Aphrodite (PRJNA813957). Glycine max and Phalaenopsis Aphrodite were newly added species to Release 3.0. The rice data was from our unpublished article. These ST data were re-analyzed to identify new marker genes for cell type annotation. In total, 11,809 marker genes were identified from these ST data.
  • In sum, a total of 112,657 marker genes from fifteen species were collected for Release 3.0 (detailed information can be found in the table below).
SpeciesTissuesCell typesMarker genes
Arabidopsis thaliana2227523,154
Oryza sativa87512,178
Solanum lycopersicum 7344,426
Zea mays118213,196
Fragaria vesca195,779
Populus4339,904
Nicotiana attenuata151,723
Lemna minuta1124,238
Brassica rapa21111,004
Manihot esculenta1155,419
Medicago truncatula1182,643
Nepeta tenuifolia175,342
Gossypium hirsutum133,716
Glycine max1393
Phalaenopsis aphrodite189,543

      New page

  • We developed a new webpage which function is to search for marker genes and display their expression patterns in spatial locations. Click here to jump to the page.

      Genome version

  • Arabidopsis thaliana, TAIR11 reference genome downloaded from TAIR (https://www.arabidopsis.org/);
  • Oryza sativa, Nipponbare (IRGSP-1.0) and 93-11 (ASM465v1) reference genomes from Ensemble Plant database;
  • Zea mays, B73 V4 reference genome from MaizeDB database (https://maizegdb.org/);
  • Solanum lycopersicum, ITAG4 reference genome from Sol Genomics Network (https://solgenomics.net/organism/solanum_lycopersicum/genome);
  • Fragaria vesca, Fragaria vesca v4.01 reference genome from Genome Database for Rosaceae (https://www.rosaceae.org/species/fragaria_vesca/genome_v4.0.a1);
  • Populus, P.trichocarpa_v4.1 reference genome from NCBI;
  • Nicotiana attenuata, NIATTr2 reference genome from NCBI;
  • Lemna minuta, Reference genome from CoGe (https://genomevolution.org/);
  • Gossypium hirsutum, TM‐1 reference genome from CottonGen Database (http://www.cottongen.org);
  • Brassica rapa, Chinese cabbage A03 v1 reference genome from CCEMD (www.bioinformaticslab.cn/EMSmutation/home);
  • Medicago truncatula, MtrunA17r5.0 with annotation r1.7 reference genome from NCBI;
  • Manihot esculenta, Manihot esculenta reference genome from NCBI;
  • Phalaenopsis aphrodite, Phalaenopsis aphrodite reference genome from Phalaenopsis aphrodite Genome Resources(https://orchidstra2.abrc.sinica.edu.tw/orchidstra2/pagenome/padownload.php);
  • Glycine max, Glycine_max_v2.1 reference genome from NCBI;
  • Nepeta tenuifolia, Nepeta tenuifolia reference genome from NCBI.

      Not selected data

    AccessionSpeciesTitleReasons
    PRJCA012129Bombax ceibaSingle-cell RNA landscape of the special fiber initiation process in Bombax ceibaNo available high-quality B. ceiba reference genome for cellranger mkref.
    CRR602489Triticum aestivumAsymmetric gene expression and cell-type-specific regulatory networks in the root of bread wheat revealed by single- cell multiomics analysisNo available high-quality T. aestivum reference genome for cellranger mkref.
    GSE208433Oryza sativaSingle-nucleus sequencing deciphers developmental trajectories in rice pistilsThe number of genes and gene expression level are too low to meet the quality control standards.
    PRJNA847210Gossypium hirsutumCell-specific clock-controlled gene expressionprogram regulates rhythmic fiber cell growth in cottonThe number of genes and gene expression level are too low to meet the quality control standards.
    GSE212230Arabidopsis thalianaBrassinosteroid gene regulatory networks at cellular resolution in the Arabidopsis rootThe number of genes and gene expression level are too low to meet the quality control standards.

      Spatial transcriptomics data analysis workflow

  • We downloaded the expression data, spatial coordinates, and annotation information from the download link which was provided in the article. The expression data had been segmented at the bin or single-cell level. We visualized the expression data in situ and annotated the data with cell types. After annotation, we used the previously established workflow to identify new marker genes for cell type annotation.

(by Nianmin Shang, Yaqian Lu, Hongyu Chen, Qinjie Chu, Longjiang Fan)

Release 2.0

Last modified at 2022-05-24

  • Four new species of single-cell RNA data were added in the latest version, including Lemna minuta (SAMN19243672), Nicotiana attenuata (PRJNA796301), Populus alba (PRJNA703312, PRJCA005543) and Fragaria vesca (CRA004848). Among them, poplar contains two single-cell data, and the remaining species contain one single-cell data.
  • Thirty-one new single-cell data of original four species were added in Release 2.0, including twenty-one for Arabidopsis thaliana, three for Oryza sativa, five for Zea mays, and two for Solanum lycopersicum.
  • A total of 69,462 marker genes from eight species were collected in Release 2.0 (detailed information in the following table).
SpeciesTissuesCell typesMarker genes
Arabidopsis thaliana1924820,862
Oryza sativa77111,439
Solanum lycopersicum 7344,426
Zea mays117511,599
Fragaria vesca191,723
Populus2258,406
Nicotiana attenuata151,723
Lemna minuta1124,238
  • Arabidopsis thaliana single-cell data were re-analyzed by using the Araport11 annotation files. Note that the version of genome was TAIR10 and the version of annotation files was Araport11 (downloaded from https://plants.ensembl.org/index.html)
  • Release 1.2

    Last modified at 2021-04-07

    • A total of 26,326 marker genes from four species were collected in Release 1.2 (detailed information could be found in the following table).
    • Attention: Same cell type at different developmental stages were classified as different cell types in previous version of PlantscRNAdb. For example 'Chalazal Endosperm at Globular and Early Heart Stages' and 'Chalazal Endosperm at Preglobular Stage' were listed as different cell type. But in this version of PlantscRNAdb, such cell types were considered as one cell type, therefore the number of cell types was less than that of previous versions.
    SpeciesTissuesCell typesMarker genes
    Arabidopsis thaliana107914,922
    Oryza sativa5355,428
    Solanum lycopersicum *52575
    Zea mays9425,901

    * scRNA-seq data from Solanum lycopersicum were not avaliable for public, therefore the number of marker genes from tomato was much less than the other three species. We will update this data as soon as the public scRNA-seq data from Solanum lycopersicum is avaliable.

    Release 1.1

    Last modified at 2021-03-01

    • Four datasets (10.1101/2020.11.25.397919 10.1016/j.molp.2021.01.001 10.1016/j.devcel.2020.12.015 10.1093/plcell/koaa055) were added in this version. Detailed list of reference papers could be found here.
    • A total of 26,326 marker genes from four species were collected in Release 1.1 (detailed information could be found in the following table).
    SpeciesTissuesCell typesMarker genes
    Arabidopsis thaliana1115214,922
    Oryza sativa5435,428
    Solanum lycopersicum *63075
    Zea mays9565,901

    * scRNA-seq data from Solanum lycopersicum were not avaliable for public, therefore the number of marker genes from tomato was much less than the other three species. We will update this data as soon as the public scRNA-seq data from Solanum lycopersicum is avaliable.

    Release 1.0

    Last modified at 2021-01-04

      Data collection

    • Including datasets from four species: Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum and Zea mays.
    • 23 plant single cell RNA-seq dataset/papers were used. Detailed list could be found here.
    • Eight datasets (GSE114615, GSE121619, GSE122687, GSE123013, GSE123818, GSE141730, PRJNA323955, PRJNA577177) in Arabidopsis thaliana, two datasets (GSM4363200, GSM4363201) in Oryza sativa, and one dataset (PRJNA637882) in Zea mays were also used to display the gene expressin of each cell which was show in the result page of searching marker genes (example) and in the page of JBrowse (example).
    • Collected a total of 24,573 marker genes from four species (detailed information could be found in the following table).
    SpeciesTissuesCell typesMarker genes
    Arabidopsis thaliana1115214,506
    Oryza sativa5515,441
    Solanum lycopersicum *52974
    Zea mays7414,552

    * scRNA-seq data from Solanum lycopersicum were not avaliable for public, therefore the number of marker genes from tomato was much less than the other three species. We will update this data as soon as the public scRNA-seq data from Solanum lycopersicum is avaliable.

      Genome version

    Genome version for JBrowse are:
    • Arabidopsis thaliana, TAIR10 reference genome downloaded from TAIR (https://www.arabidopsis.org/);
    • Zea mays, B73 V4 reference genome from MaizeDB database (https://maizegdb.org/);
    • Oryza sativa, Nipponbare (IRGSP-1.0) and 93-11 (ASM465v1) reference genomes from Ensemble Plant database.

      Bioinformatics workflow

    In brief, Fastq-dump, CellRanger and Seurat were used to deal with the raw scRNA-seq data:
    • Fastq-dump (v2.9.6) was used to convert the SRA data into the corresponding fastq files, and followed by changing the obtained fastq file names to XX__S1_L001_I1_001.fastq.gz, XX__S1_L001_R1_001.fastq.gz, XX__S1_L001_R2_001.fastq.gz (XX means accession number).
    • After obtaining fastq sequencing data, raw reads were demultiplexed and mapped to the reference genome by 10X Genomics CellRanger (v4.0.0) pipeline using default parameters.
    • All downstream single-cell analyses were performed using Seurat (v3.0.0).
    • In brief, the gene-cell matrices were load into the Seurat package, which was implemented in R (v. 4.0.2). To remove low quality cells, we filtered the cells with unique gene counts fewer than 200. The genes expressed in at least three single cells were kept. Seurat SCTransform function was used to scale and normalize raw data. For principle component (PC) analysis, the scaled data were reduced to 50 approximate PCs (set npcs = 50). Then Clusters were identified using the Seurat function ‘FindClusters’ with ‘resolution =1.0’ . In the case of multiple samples, Seurat was also then used to combine multiple datasets into a single dataset using Canonical Correlation Analysis by IntegrateData function. To align cell population clusters from the unsupervised scRNA-seq to known cell types, we assessed 1) expression of known cell type-specific marker genes identified from PlantscRNAdb, 2) Spearman’s and Pearson’s correlation analysis of expression profiles of cell populations isolated from reporter gene lines, and 3) Index of Cell Identity (ICI) scores. Finally, Seurat FindAllMarkers function was used to identify markers that were up-regulated in each cluster versus all other cells (average FC ≥ 1 plus maximum adjusted P-value ≤ 0.05) , where only the control group data was considered. at the same time, the marker gene must be only expressed in less than 25% of the cells in the corresponding cluster.

      Reference

    A total of 23 datasets (10.1016/j.molp.2020.06.010 10.1101/2020.09.08.288498 10.1101/2020.10.02.324327 10.1101/2020.06.29.178863 10.1126/science.aay4970 10.1016/j.devcel.2019.02.022 10.1105/tpc.18.00785 10.1007/s00497-018-00355-4 10.1104/pp.18.01482 10.1016/j.celrep.2019.04.054 10.1016/j.celrep.2019.06.041 10.1016/j.molp.2019.04.004 10.1016/j.cell.2016.04.046 10.1186/s13059-015-0580-x 10.1101/2020.08.25.267476 10.1186/s13059-020-02094-0 10.1101/2020.09.20.305029 10.1101/2020.08.25.267427 10.1126/science.aav6428 10.1101/2020.01.30.926329 10.1101/2020.11.14.382812 10.1016/j.molp.2020.12.014 10.1093/plcell/koaa060) were used. Detailed list couls be found here.