The
M. alba genome was sequenced using the Illumina HiSeq Xten sequencing platform. K-mer analysis indicated that
M. alba has a large genome of approximately 1.8 Gb with 59.1% repetitive elements and is highly heterozygous (4.76%) (Figure. S2). At the same time, the parents were sequenced using the Illumina HiSeq Xten sequencing platform.
M. champaca has a large genome of approximately 2.24 Gb but a degree of low heterozygosity (0.38%) (Figure. S3).
M. montana has a large genome of approximately 1.47 Gb and has low heterozygosity (0.95%) (Figure. S4). We used the parental Illumina reads to bin the 261 Gb Nanopore reads of the hybrid based on parental origin. A total of 112 G Nanopore reads were assigned to
M. champaca, and 130 G Nanopore reads were assigned to
M. montana for subsequent haplotype genome assembly (Table S1). A total of 61 G Pacbio Hifi reads were assigned to
M. champaca, and 86 G Pacbio Hifi reads were assigned to
M. montana for subsequent haplotype genome assembly. The assembled original genome sizes were 2.55 Gb MC (
M. champaca) and 2.42 Gb MM (
M. montana). respectively. Deduplication was performed by Purge_dups, The obtained genome sizes were 2.23 and 2.19, respectively. Then the raw sequencing data of Pacbio were used for Polish. Using minimap2 + racon strategy to complete three rounds of polish process. The final genome size is 2.19 and 2.13. The contigs and scaffolds of the MC (
M. champaca) and MM (
M. montana) subgenomes were further scaffolded into 19 chromosomes by Hi-C technology, and the anchored genomes were 2.03 Gb (97.19%) and 2.06 Gb (97.96%), respectively (Figures. S5, S6,
Fig. 1a,
Table 1). The second-generation genome data of
M. alba were mapped to the haplotype genomes MC and MM, with mapping rates of 99.19% and 98.32%, respectively. As a reference, we mapped the second-generation sequencing data of
Litchi chinensis,
Zea mays,
Magnolia biondii, and
Liriodendron chinense against the
M. alba genome, and the mapping rates were 5.57%, 11.61%, 86.20% and 51.98%, respectively. This indicated a high similarity between
M. alba and the MC and MM haplotype genomes. Both assemblies are contiguous (mean Contig N50: 12.425Mb, Scaffold N50:116.01 Mb) and complete (mean BUSCO completeness: 95.5%) (Table S2,
Fig. 1c). The corresponding second-generation genome data were compared with the genome data via the Burrows‒Wheeler Aligner (BWA). The mapping rate of the haplotype genome and the Illumina data was 95.68% (MC) and 96.72% (MM). Merqury was used to assess the consensus quality value (QV) and the k-mer completeness of the
M. alba genome assembly, which were 34.03 and 96.22%, respectively. These results suggested that the assembled
M. alba genome had high completeness and accuracy.