Chen L, Yu Y. Zhang X, Ye C, Fan L*. 2016. PcircRNA_finder: a software for circRNA prediction in plants
Availability: http://ibi.zju.edu.cn/bioinplant/tools/PcircRNA_finder.zip
E-mail:
Li Chen, thisisemailhello@gmail.com
Longjiang Fan, fanlj@zju.edu.cn
Installation Prerequisite
1. Python (version=2.7)
3. STAR
https://github.com/alexdobin/STAR
4. tophat
http://ccb.jhu.edu/software/tophat/fusion_index.shtml
5. find_circ
http://circbase.org/download/find_circ.tar.gz
http://www.netlab.uky.edu/p/bioinfo/MapSplice2
http://www.bioinf.uni-leipzig.de/Software/segemehl/
http://ibi.zju.edu.cn/bioinplant/tools/PcircRNA_finder.zip
https://sourceforge.net/projects/bedtools/
http://samtools.sourceforge.net/
http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
12. bowtie1
http://bowtie-bio.sourceforge.net/index.shtml
Running steps
1. Running STAR to get "stiChimeric.out.junction" file, example command below:
STAR --runMode genomeGenerate --genomeDir /home3/cl/at_sti_15x/STAR_index --genomeFastaFiles Athaliana_167.fa --runThreadN 8
STAR --genomeDir /home3/cl/at_sti_15x/STAR_index --readFilesIn ../left.fastq ../right.fastq --runThreadN 8 --sjdbGTFfile /home1/cl/ok/TAIR10_GFF3_genes.gtf --chimSegmentMin 20 --chimScoreMin 1 --alignIntronMax 100000 --outFilterMismatchNmax 4 --alignTranscriptsPerReadNmax 100000 --outFilterMultimapNmax 2 --outFileNamePrefix sti --outSAMtype BAM SortedByCoordinate
2. Running tophat to get " accepted_hits.bam " file, example command below:
tophat2 -a 6 --microexon-search -m 2 -p 10 -G $gtf_file -o ${sample_name} $genome_bowtie_index $fastq1list $fastq2list
bamToFastq -i ${sample_name}/unmapped.bam -fq ${sample_name}/unmapped.fastq
tophat2 -o ${sample_name}_fusion -p 15 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search $genome_bowtie_index ${sample_name}/unmapped.fastq
3. Running find_circ to get " find_circ_circRNA.bed" file, example command below:
$bowtie2_path -p16 --very-sensitive --mm -M20 --score-min=C,-15,0 -x $genome_index -q -U $fastq_file 2> ${name}_bt2_firstpass.log | $samtools_path view -hbuS - | $samtools_path sort - $name
$samtools_path view -hf 4 ${name}.bam | $samtools_path view -Sb - > unmapped_${name}.bam
python $find_circRNA_path/unmapped2anchors.py unmapped_${name}.bam > unmapped_${name}_anchors.qfa
$bowtie2_path --reorder --mm -M20 --score-min=C,-15,0 -q -x $genome_index -U unmapped_${name}_anchors.qfa 2> bt2_secondpass_test.log | python $find_circRNA_path/find_circ.py -r $sample_description_file -G $chr_dir -p $prefix -s test_out_$name/sites.log > test_out_$name/sites.bed 2> test_out_$name/sites.reads
grep _circ_ sites.bed > find_circ_circRNA.bed
4. Running mapsplice to get " fusions_raw.txt " file, example command below:
python /datacenter/disk3/cl/mapsplice/MapSplice-v2.1.8/mapsplice.py -p 10 --non-canonical --fusion-non-canonical --min-fusion-distance 200 -c /datacenter/disk1/cl/data/at/chr -x /home1/cl/ok/Athaliana_167.fa --gene-gtf /home1/cl/ok/TAIR10_GFF3_genes.gtf -1 ../left.fastq -2 ../right.fastq --qual-scale phred33 -o circ
5. Running segemehl to get " splicesites.bed" file, example command below:
$segemehl_path/segemehl.x -t 8 -s -d $genome_fasta -i $genome_segemehl_index -q $fastq1_file -p $fastq2_file -o ${name_1}.sam -u ${name_1}.fq -S -T -D 2
samtools view -hbuS ${name_1}.sam | samtools sort - ${name_1}.sorted
samtools view -h ${name_1}.sorted.bam | gzip -c > ${name_1}.sorted.sam.gz
$segemehl_path/testrealign.x -d $genome_fasta -q ${name_1}.sorted.sam.gz -U ${name_1}.splitmap.txt -T ${name_1}.transmap.txt -n
6. Running PcircRNA_finder to get final results, example command below:
perl ecircRNA_finder.pl /home3/cl/at_sti_15x/re_run1/1/1/stiChimeric.out.junction /home3/cl/at_sti_15x/re_run1/1/1/sti_fusion/accepted_hits.bam /home3/cl/at_sti_15x/re_run1/1/1/test_out_sti/find_circ_circRNA.bed /home3/cl/at_sti_15x/re_run1/1/1/circ/fusions_raw.txt /home3/cl/at_sti_15x/re_run1/1/1/splicesites.bed 20000 /home3/cl/at_sti_15x/re_run1/1/1/at.txt 5 5 /home1/cl/ok/Athaliana_167.fa 100 1 left.fastq right.fastq 1
Output files
column 1: canonical splicing corresponding position
column 3: chromosome
column 4: start (0-based)
column 5: end
column 6: name
column 7: -
column 8: strand
column 9:host gene &circRNA's included exons
column 10: Canonical means canonical splicing sites are the same as circRNA position; Alt_acceptor means circRNA position have alternative acceptor sites compared to canonical splicing sites; Alt_donor means circRNA position have alternative donor sites compared to canonical splicing sites
column 11: backspliced reads number
example output file: left_Final_exonic_circRNA.txt
Chr2:9845487..9848769 Chr2:9845486..9848769 Chr2 9845486 9848769 backjunc_117 0,16,30,17,0 - AT2G23140.1:exon_5,exon_4,exon_3,exon_2||AT2G23140.2:exon_5,exon_4,exon_3,exon_2 Alt_acceptor 16
Chr3:10084136..10086696 Chr3:10084136..10086696 Chr3 10084135 10086696 backjunc_118 0,19,0,23,0 - AT3G27300.1:exon_12,exon_11,exon_10,exon_9,exon_8,exon_7,exon_6,exon_5,exon_4,exon_3,exon_2,exon_1||AT3G27300.2:exon_13,exon_12,exon_11,exon_10,exon_9,exon_8,exon_7,exon_6,exon_5,exon_4,exon_3,exon_2,exon_1||AT3G27300.3:exon_12,exon_11,exon_10,exon_9,exon_8,exon_7,exon_6,exon_5,exon_4,exon_3,exon_2,exon_1 Canonical 6
Chr3:10084137..10086697 Chr3:10084136..10086696 Chr3 10084136 10086697 backjunc_119 0,0,36,0,0 - AT3G27300.1:exon_12,exon_11,exon_10,exon_9,exon_8,exon_7,exon_6,exon_5,exon_4,exon_3,exon_2,exon_1||AT3G27300.2:exon_13,exon_12,exon_11,exon_10,exon_9,exon_8,exon_7,exon_6,exon_5,exon_4,exon_3,exon_2,exon_1||AT3G27300.3:exon_12,exon_11,exon_10,exon_9,exon_8,exon_7,exon_6,exon_5,exon_4,exon_3,exon_2,exon_1 Alt_acceptor,Alt_donor 12
Example data test
1. example RNA-seq PE data in A.thaliana (left.fastq, right.fastq)
perl ecircRNA_finder.pl /home3/cl/at_sti_15x/re_run1/1/1/stiChimeric.out.junction /home3/cl/at_sti_15x/re_run1/1/1/sti_fusion/accepted_hits.bam /home3/cl/at_sti_15x/re_run1/1/1/test_out_sti/find_circ_circRNA.bed /home3/cl/at_sti_15x/re_run1/1/1/circ/fusions_raw.txt /home3/cl/at_sti_15x/re_run1/1/1/splicesites.bed 20000 /home3/cl/at_sti_15x/re_run1/1/1/at.txt 5 5 /home1/cl/ok/Athaliana_167.fa 100 1 left.fastq right.fastq 1
2. example RNA-seq PE data in rice (left.fastq, right.fastq)
perl ecircRNA_finder.pl /home2/cl/compare/new_compare_data/new_method_v2/reads/sti2/stiChimeric.out.junction /home2/cl/compare/new_compare_data/new_method_v2/reads/sti2/sti_fusion/accepted_hits.bam /home2/cl/compare/new_compare_data/new_method_v2/reads/sti2/test_out_sti/find_circ_circRNA.bed /home2/cl/compare/new_compare_data/new_method_v2/reads/sti2/mapsplice/circ/fusions_raw.txt /home2/cl/compare/new_compare_data/new_method_v2/reads/sti2/seg/splicesites.bed 20000 /home2/cl/compare/all.gff3 5 5 /home1/cl/ok/all.fa 100 1 left.fastq right.fastq 1