Frequently Asked Questions

About Database
What's TE-TSS? TE-TSS is a database of transposon (TE) - derived transcription start site (TSS) database in human and mouse.
In the browser, you can view TSS usage in different samples from various cell types and tissue types. This allows you to analyze and explore tissue-specific gene expression patterns and regulatory elements at the transcriptional level.
For TE-derived TSSs, our database analyzes the evolutionary and functional aspects of the TE-TSS region. In terms of evolutionary analysis, we utilize BLAT to detect homologous regions in aligned species. Through the genome browser, you can explore the sequence features of these homologous regions, including the presence of TE elements and their involvement as TSSs. Regarding functional analysis, we explore the conservation of transcription factor motifs within the TE-TSS region during evolution, allowing you to discover the motifs that have been retained across different species.
How can I access TE-TSS data? TE-TSS contains RNA-seq data from 1321 humans and 1766 mice. The data page provides detailed information about the available datasets, including sample metadata and experimental conditions. To download raw sequencing files (e.g., FASTQ) or processed expression data (e.g., TSS usage matrices), please visit the detail page.
Defination of TE-TSS & TE-TSS Region
What's the difference between annotated TE-derived TSS and predicted TE-derived TSS? Annotated TSS: The TSS reference file is extracted from annotation files from RefSeq, GENCODE and Ensembl, and only experimentally validated TSSs were retained. We use CAGE-seq (FANTOM5), RAMPAGE (ENCODE, human only), GRO-cap (human only), PRO-cap (human only), csRNA-seq (human only), and NET-CAGE (human only) as experimental validation.
Predicted TSS: The predicted TSS is predicted from individual samples. To ensure accuracy, we only retain TSSs that were predicted in at least three different samples.
Annotated TE-derived TSSs and predicted TE-derived TSSs are filtered from annotated TSSs and predicted TSSs, respectively, by overlapped with TE annotations (RepeatMasker).
To account for the close proximity of many TSSs, we merge TSSs within ±50bp into a single TSS.
Image
What's a TE-TSS region? To explore the evolutionary and functional characteristics of the TE-TSSs, we extended the sequence by ±50bp around the TE-TSS site as the TE-TSS region for further analysis.
BLAT Analysis
For what species were analyzed using BLAT? For human TE-TSS regions, we perform BLAT analysis on 5 primate species (baboon, chimp, crab-eating macaque, gibbon, and green monkey), 2 rodent species (rat and mouse), and 4 mammalian species (pig, dog, sheep, and rabbit), whereas for mouse TE-TSS regions, we conduct BLAT analysis on 3 rodent species (rat, chinese hamster, and pika), 3 primate species (human, chimp, and rhesus), and 4 mammalian species (pig, dog, sheep, and rabbit).
What's modified BLAT score? Modified BLAT score is calculated using following formula: score = (matches + repMatches - mismatches - qNumInsert - tNumInsert) / qSize.
matches - Number of bases that match that aren't repeats
repMatches - Number of bases that match but are part of repeats
misMatches - Number of bases that don't match
qNumInsert - Number of inserts in query
tNumInsert - Number of inserts in target
qSize - Query sequence size
To ensure the accuracy of BLAT alignments, we selectively retain sequences with a modified BLAT score greater than 0.7. Additionally, to avoid excessively long insertions in the target sequences, we constrain the target sequence size to be within 80% to 120% of the query sequence size. These filtered results are utilized for multiple sequence alignment and visualization in the genome browser.
Motif Analysis
How is motif analysis performed? When performing motif analysis on TE-TSS regions, we utilize the JASPER (JASPAR2022 CORE redundant v2) database and scan motifs using the FIMO tool in MEME suite.
Cite & Contact
How to Cite? Please cite: Gu X, Wang M, Zhang XO. TE-TSS: an integrated data resource of human and mouse transposable element (TE)-derived transcription start site (TSS). Nucleic Acids Res. 2023 Nov 13:gkad1048. doi: 10.1093/nar/gkad1048.
Who should I contact if I have questions or find an error? Please contact zhangxiaoou@tongji.edu.cn. Please include the short description of the problem.



Copyright © 2023.Tongji University, Life Science and Technology Department, Xiao-Ou Zhang Lab

Contact us: zhangxiaoou@tongji.edu.cn

Related Resource: Zhang-Lab Website