IKKE: RNA Motif Discovery Tool

IKKE is a C program designed to analyze RNA sequences and identify enriched motifs in protein-bound RNA. It follows a structured pipeline including k-mer counting, frequency calculations, and enrichment analysis, producing ranked k-mers based on their likelihood of being a motif.

Usage

ikke -t [test] -c [control] [OPTIONS]

At minimum, you must provide:

  • -t, --test : RNA sequences bound to protein (test set)
  • -c, --control : Control RNA sequences (background set)

Command Line Options

General

  • -h, --help Print help and exit.

  • --detailed-help Print detailed help, including hidden options, and exit.

  • -V, --version Print program version and exit.

Input/Output

  • -t, --test=filename Input file containing protein-bound RNA sequences. Only one test file is supported.

  • -c, --control=filename Input file containing control RNA sequences. Only one control file is supported.

  • -o, --output=filename Set the output file prefix. Default: motif.

  • -k, --kmer=INT Length of k-mers to analyze. Default: 3.

  • -i, --iterations=INT Number of iterations for the analysis. Default: 1.

  • --threads=INT Number of threads for parallel computation. Default: 1. ⚠️ Warning: For long sequences (>16,000 nt), multi-threading may cause incorrect counts or failures. Use with caution.

  • -d, --delimiter=char Output delimiter. Default: , (CSV). Supported delimiters:

    • ,.csv
    • \t (tab) → .tsv
    • : or | or space → .dsv
  • --no-log Disable log2 normalization of enrichment values. By default, enrichments are log2-normalized for interpretability.

Algorithms

  • -R, --enrichments Compute enrichment values (R values) from k-mer frequencies.

  • -s, --shuffle Shuffle sequences while preserving k-let counts.

  • --klet=INT Set k-let size for sequence shuffling (ushuffle). Default: -1 (automatic).

  • -p, --independent-probs Compute enrichments without using a control. Instead, expected frequencies are derived from mono-/di-nucleotide distributions.

  • -b, --bootstrap[=INT] Perform bootstrapping by subsampling sequences. Default: 10 iterations. Computes mean enrichments and standard deviation from resampled subsets.

  • --sample=INT Percent of sequences randomly subsampled per bootstrap iteration. Range: 1–100. Default: 10.

  • --seed=INT Random seed for bootstrap sampling. Default: -1 (random). Use a fixed seed for reproducible results.

Examples

  1. Basic run with default settings

    ikke -t bound.fa -c control.fa

    Produces motif.csv with 3-mer enrichments using iterative k-mer knockout enrichment (log2-normalized).

  2. IKKE Iterations

    ikke -t bound.fa -c control.fa -i 10

    Produces motif.csv with 3-mer enrichments using iterative k-mer knockout enrichments for 10 iterations. Will contain the 10 most enriched k-mers. Use the -i flag to specify ikke iterations.

  3. Custom k-mer size and output prefix

    ikke -t bound.fa -c control.fa -k 7 -o experiment1

    Produces experiment1.csv with 7-mer enrichments. Specify k-mer size with the -k flag.

  4. Regular Enrichments

    ikke -t bound.fa -c control.fa -R -k 5

    Produces motif.csv with 5-mer enrichments. The -R flag is used to compute regular enrichments. Computes all 4^k enrichments (1024 for 5-mer) and orders them based on their R value.

  5. Bootstrapped enrichment analysis

    ikke -t bound.fa -c control.fa -b 50 --sample=20 --seed=123

    Runs 50 bootstrap iterations, subsampling 20% of sequences each time. Specify seed for reproducible results.

  6. Tab-delimited output

    ikke -t bound.fa -c control.fa -d "\t"

    Produces motif.tsv instead of CSV. Columns will be tab delimited.

  7. Multi-threading (use with caution for long sequences)

    ikke -t bound.fa -c control.fa --threads=4
  8. Control Independent Enrichments

    ikke -t bound.fa -p

    Produces a motif.csv file with 3-mer enrichments. No control sequences are required.

  9. Sequence Shuffling

    ikke -t bound.fa -s

    Produces a motif.csv file with 3-mer enrichments. Control dataset is based off shuffled sequences of the test dataset where k-let=1.

  10. Sequence Shuffling with custom k-mer

    ikke -t bound.fa -k=6 -s --klet=3

    Produces a motif.csv file with 6-mer enrichments. Shuffled sequences (for control dataset) preserve tri-nucleotides.

Output Format

The output file contains enriched k-mers ranked by significance. Columns typically include:

  • k-mer : The k-mer sequence
  • Enrichment : Raw or log2-normalized enrichment value
  • Std. Dev. (if bootstrapped) : Variation across bootstrap iterations

Notes

  • Input sequences should be provided in plain FASTA/FASTQ/raw reads format.
  • Bootstrap can be applied to all enrichment algorithms (ikke, R, probs, and shuffle).
  • For reproducibility of randomized methods (shuffle/bootstrap), specify --seed.

Citation

If you use IKKE in your research, please cite the associated publication.