The centromere is an important part of chromosomes and plays a crucial role in the proper segregation of chromosomes. Unfortunately, information about the centromeres of kiwifruit is limited. One of the main reasons is that the centromere region contains highly repetitive sequences, which impede assembly from short DNA sequencing reads (Nurk et al.
2022). Nevertheless, the development of sequencing technology allow us to have the opportunity to assemble centromere regions. To identify the location and sequence features, we used the Tandem Repeats Finder (TRF) tool to search tandem repeats in our assemblies, and only the repeat monomers with lengths ranging from 100 to 200 bp were retained. And then CD-HIT (Fu et al.
2012) was used for clustering these monomers to reduce sequence redundancy and improve the precision of centromere localization based on sequence similarity search, the continuous and high-frequency regions were thought to be approximate centromeric sequences. Finally, we determined the location of the centromeres of all chromosomes in two haplotype assemblies. The result showed that the centromere boundaries of the two haplotype genomes had similar positions on the chromosome, and the length of the centromere region ranged from 217,369 bp to 1,893,971 bp in MDHAPA and from 112,182 bp to 1,168,845 bp in MDHAPB (Supplementary Table 10). Aside from that, there are 147 and 151 new genes predicted in the centromere region of MDHAPA and MDHAPB, respectively (Supplementary Table 11 and 12). To verify the accuracy of the centromere region, we analyzed the gene density, repeat distribution, and sequence similarity on the chromosome (
Fig. 3a). The distribution of repeats revealed that the class I retrotransposons are more common in centromeres, while the class II retrotransposons were more evenly distributed across the genome, which was similar to other species such as
Brassica (Perumal et al.
2020) and
A. chinensis (Yue et al.
2022). In addition, the centromere region has low gene density and low similarity compared with other regions on the chromosome (
Fig. 3a). Finally, the Hi-C heatmap also showed that the location of centromere region was correct(
Fig. 3b).