IKKE: RNA Motif Discovery Tool
IKKE is a C program designed to analyze RNA sequences and identify enriched motifs in protein-bound RNA. It follows a structured pipeline including k-mer counting, frequency calculations, and enrichment analysis, producing ranked k-mers based on their likelihood of being a motif.
Usage
ikke -t [test] -c [control] [OPTIONS] At minimum, you must provide:
-t, --test: RNA sequences bound to protein (test set)-c, --control: Control RNA sequences (background set)
Command Line Options
General
-h, --helpPrint help and exit.--detailed-helpPrint detailed help, including hidden options, and exit.-V, --versionPrint program version and exit.
Input/Output
-t, --test=filenameInput file containing protein-bound RNA sequences. Only one test file is supported.-c, --control=filenameInput file containing control RNA sequences. Only one control file is supported.-o, --output=filenameSet the output file prefix. Default:motif.-k, --kmer=INTLength of k-mers to analyze. Default:3.-i, --iterations=INTNumber of iterations for the analysis. Default:1.--threads=INTNumber of threads for parallel computation. Default:1. ⚠️ Warning: For long sequences (>16,000 nt), multi-threading may cause incorrect counts or failures. Use with caution.-d, --delimiter=charOutput delimiter. Default:,(CSV). Supported delimiters:,→.csv\t(tab) →.tsv:or|or space →.dsv
--no-logDisable log2 normalization of enrichment values. By default, enrichments are log2-normalized for interpretability.
Algorithms
-R, --enrichmentsCompute enrichment values (Rvalues) from k-mer frequencies.-s, --shuffleShuffle sequences while preserving k-let counts.--klet=INTSet k-let size for sequence shuffling (ushuffle). Default:-1(automatic).-p, --independent-probsCompute enrichments without using a control. Instead, expected frequencies are derived from mono-/di-nucleotide distributions.-b, --bootstrap[=INT]Perform bootstrapping by subsampling sequences. Default:10iterations. Computes mean enrichments and standard deviation from resampled subsets.--sample=INTPercent of sequences randomly subsampled per bootstrap iteration. Range:1–100. Default:10.--seed=INTRandom seed for bootstrap sampling. Default:-1(random). Use a fixed seed for reproducible results.
Examples
Basic run with default settings
ikke -t bound.fa -c control.faProduces
motif.csvwith 3-mer enrichments using iterative k-mer knockout enrichment (log2-normalized).IKKE Iterations
ikke -t bound.fa -c control.fa -i 10Produces
motif.csvwith 3-mer enrichments using iterative k-mer knockout enrichments for 10 iterations. Will contain the 10 most enriched k-mers. Use the-iflag to specify ikke iterations.Custom k-mer size and output prefix
ikke -t bound.fa -c control.fa -k 7 -o experiment1Produces
experiment1.csvwith 7-mer enrichments. Specify k-mer size with the-kflag.Regular Enrichments
ikke -t bound.fa -c control.fa -R -k 5Produces
motif.csvwith 5-mer enrichments. The-Rflag is used to compute regular enrichments. Computes all 4^k enrichments (1024 for 5-mer) and orders them based on their R value.Bootstrapped enrichment analysis
ikke -t bound.fa -c control.fa -b 50 --sample=20 --seed=123Runs 50 bootstrap iterations, subsampling 20% of sequences each time. Specify seed for reproducible results.
Tab-delimited output
ikke -t bound.fa -c control.fa -d "\t"Produces
motif.tsvinstead of CSV. Columns will be tab delimited.Multi-threading (use with caution for long sequences)
ikke -t bound.fa -c control.fa --threads=4Control Independent Enrichments
ikke -t bound.fa -pProduces a
motif.csvfile with 3-mer enrichments. No control sequences are required.Sequence Shuffling
ikke -t bound.fa -sProduces a
motif.csvfile with 3-mer enrichments. Control dataset is based off shuffled sequences of the test dataset where k-let=1.Sequence Shuffling with custom k-mer
ikke -t bound.fa -k=6 -s --klet=3Produces a
motif.csvfile with 6-mer enrichments. Shuffled sequences (for control dataset) preserve tri-nucleotides.
Output Format
The output file contains enriched k-mers ranked by significance. Columns typically include:
- k-mer : The k-mer sequence
- Enrichment : Raw or log2-normalized enrichment value
- Std. Dev. (if bootstrapped) : Variation across bootstrap iterations
Notes
- Input sequences should be provided in plain FASTA/FASTQ/raw reads format.
- Bootstrap can be applied to all enrichment algorithms (ikke, R, probs, and shuffle).
- For reproducibility of randomized methods (shuffle/bootstrap), specify
--seed.
Citation
If you use IKKE in your research, please cite the associated publication.