A. Using the Software

1. What are the conversion steps for deconvolution?

The Barcode Deconvoluter software requires a library file (.BLIB) containing barcode annotations, and input file(s) with sequencing data. Deconvoluter work flow is as follows: Illumina sequencing data –> TAB file –> Annotated file with counts

2. What file formats can be used as Deconvoluter input?

Deconvoluter uses a proprietary BLIB format for the library annotation files. Library annotation files for all DECIPHER modules are available upon request. Please contact us at support@decipherproject.net.

For sequencing data, our software supports most major data formats that HT Sequencing facilities use: FASTA, FASTQ, QSEQ, RAW, Miro, and Seq. It will generate a tab file from any of these, and will further convert it into an annotated file. Also, many HT Sequencing facilities offer an option of supplying data in tab format in addition to the raw data, and that will allow you to skip the first conversion step.

3. What is the size limit on the raw data files?

The 32-bit version of our software can convert files up to 2 gb in size. The limitations on the 64-bit version are determined by the amount of RAM in your computer; we routinely use it for files of 50-100 gb. If you believe that file size might be a problem for your analysis, you can break the raw data file into several smaller ones and then simply add individual counts for each barcode.

4. What if I have raw data for one lane in multiple files, not one?

The 64-bit version of Deconvoluter allows you to select an entire folder as input. On the rightmost side of the interface against “Input files” you can find three buttons: one that selects an entire folder, one that selects a specific file, and one that clears the previous choice. 32-bit version users can use a free TXTcollector tool that allows file merging. Be sure to use the “no separator,” “no carriage returns,” and “no filename” settings.

5. What are the program options?

Library definition: allows to select the library annotation file; Decipher users should leave other options as default.

Inputs and Outputs: selection of name and location for the input files, of location for output files.

Calculation parameters:

  • a choice of error correction level (exact matches only or one error correction — DECIPHER barcode design does not fully support two-error correction).
  • “Correct N symbols,” “Correct Insertions,” “Correct Indels” — these options are helpful for processing a particularly dirty input and you might enable them for your first analysis; however, for a clean sample and successful sequencing there are very few to none of these to correct. If you find that is the case for your samples, we recommend disabling these options to keep the output cleaner and the analysis faster.
  • “Use multiplexing” (NO LONGER RECOMMENDED OR SUPPORTED due to inconsistent results) is for screens employing more than one DECIPHER module. Vectors used for the modules differ in two nucleotides adjacent to the barcode on the 5’ end: AT in the first module, CC in the second, and GG in the third. Notice that if you are using the reverse HT sequencing primer (GexSeq), the sequences in the raw data files would be complementary—GG for the second, CC for the third—and located on the 3’ end of the barcode. Nonetheless, the Deconvoluter settings should not be changed for complementarity as the program will recognize them either way. The input data should be 20 bp-long sequences (18 bp for barcode and 2 bp for module identifier). With the multiplexing option enabled and the appropriate module identifiers entered, the program will be able to process a file with a mixture of two or three modules and to separate output for each individual module.
    NOTE: Alternatively,a mixture of barcode sequences from multiple modules can be divided up manually from the TAB file by sorting using the module identifier. After splitting into separate files, the deconvoluter software can be used to process each file using the single module BLIB files.

6. What is the output format?

The default output format is .csv (comma-separated values) file. These can be opened by any spreadsheet or data processing program including Excel.

7. What is in the output files?

  1. Summary contains the sum totals for each of the analysis types you would do, i.e. number of skipped sequences, exact matches, and total number of sequences processed; also, if you request error correction, N symbol correction, correction of insertions and indels, it would contain statistics for these. This is useful to get a “bird’s eye view” of the data and troubleshoot some problems. Examples:
    • If you see that all of the sequences are skipped, and there are zero exact matches, that’s likely to mean that the sequences in your tab file are of the wrong length (e.g. 21 bases instead of 18)
    • If you see that you have ~10x more skipped sequences than matches, you might be using the wrong module library file (e.g. HDM2 instead of HDM3)
    • A ratio of skipped sequences to the number of processed ones allows you to estimate the quality of your sample prep. Under 5% skipped is a sign that your sample is good; this is due to background mutation rate. A number above 15% is a cause for concern and a sign that your results might be unreliable. Notice also that if you are using multiplexing/masking, the number of skipped sequences in the output file refers to the entire lane, not to individual indexed data.
  2. Skip file contains individual sequences that had no match in the barcode library. Skip file is useful for troubleshooting by looking at individual non-matching sequences to check for patterns. Perhaps some of the sequences are moved by one base or have a skipped base; these problems might arise from primer issues. If there is a problem at the sequencing step, your HT sequencing service liaison will need this information to help you troubleshoot the issue.
  3. Output file has the data itself, i.e. individual counts for each barcode, and corresponding annotation (mRNA target sequence, gene symbol, RefSeq ID etc.).

For a discussion of visualization and interpretation, please refer to our Example DECIPHER shRNA Library Screening Data

B. Using the Software

1. When Deconvoluter starts, I get an error message: “Unable to load DLL barcore.dll.”

Deconvoluter is only compatible with MacOSX, Windows Vista, and Windows 7. Attempting to run it on WinXP or earlier operating systems will generate the error message above.

2. The results appear wrong, what could be the problem?

The most common cause of problems is running raw data files that contain sequences of the wrong length. If you are working with DECIPHER libraries, please make sure the sequences in raw data files are 18 bp long (or 20 if you are using multiplexing—SEE “Using the Software”, FAQ #5). Start the troubleshooting by opening one of the raw data files manually (e.g. in Notepad) and checking the number of base pairs in the sequences. If indeed the sequences are longer than 18 bp, they can still be used for analysis. After conversion of raw data files to tab format, the sequences in the tab file can be trimmed with an external tool. A PERL script is the simplest solution; UltraEdit Studio allows manipulations like this; it can also be done in Excel by importing a tab file using a fixed-width option of 18 characters to recognize the lines.

Cellecta now offers a Barcode Trimmer software for the PC, created by a DECIPHER user from the Karolinska Institute. Find out more on the Software page.

If you need an answer to a question not listed here, please email us or call us at 650-938-3910.