Dr. Ivan's Blog Medicine Art and Design Programming Music




About

Feature Extraction (FE) is Agilent's software suite geared towards extraction and analysis of CGH array data. However, in parallel with advancement of CGH technology, a number of other valuable tools have been created to analyze, visualize and otherwise process the data retrieved from such analyses. Unfortunately, not all of them are able to import data files from FE directly. One such program is CGH Explorer (CGHe).

Fe2cghe (pronounced “eff-ee-to-see-gee-aitch-ee”) is a small converter designed to solve this problem. Furthermore, some useful additional features have been implemented as well to alleviate the need to manually alter any aspects of the raw data.

  1. Merging of any number of files
  2. Raw data files may be derived from arrays of any size
  3. Removing unnecessary columns
  4. Splitting columns (chromosome number, start and stop)
  5. Removing flagged values
  6. Removing controls
  7. Sorting data by two criteria: chromosome number and start position
  8. Merging duplicate entries
  9. Checking file consistency (cross-matching ProbeUID-s and number of lines)

After the processing, results are written to a master file which should contain all data needed by CGH Explorer and can be imported right away. Output files may also be used for other purposes as well.

If you encounter any problems with the version you are using, please contact me - and I will be most happy to help.

This program is distributed under GPL (GNU Public License) version 3. You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.






Screenshots

  






Downloads

Latest version is: 14.1.0 (last modified: 16.12.11)
Download it either as a zip-file or a tar.gz-file. Both archives contain the source code, the manual and a pre-compiled executable for Windows. The manual can also be viewed separately here

NB! Probe classification is frequently updated by Agilent. To avoid incompatibilities between your files it is highly recommended to use the same Feature Extraction version to export all files you want to merge - and ensure that the same version of grid files and same genomic build are used.





Version History

  • v.14.1.0: Added option to ignore all unknown chromosomes (now default behaviour) as well as ability to copy and clear output log. Speed-up on Windows (updated to MinGW GNU C++ compiler v. 4.4.1). Some small visual bug fixes and enhancements.

  • v.14.0.2: Column labels for each array now contain file name, not plate ID.
  • v.14.0.1: Probes belonging to chromosome "M" will for now be ignored (typo? mitochondrion?); minor GUI tweaks (more consistent fonts); small additions to the manual
  • v.14.0.0: This is a huge update. Most important changes are summerized below:
    • There is now a front-end (written in Java)
    • Speed-up on Windows (updated to MinGW GNU C++ compiler v. 3.4.5)
    • The manual has been almost completely rewritten. Now is better organized, includes screenshots and a section on file format specs.
    • Command line argument have changed and no input from user are required at runtime
    • Source code has been cleaned up, simplified and made more cross-platform
    • Cache size no longer estimated. 5 is default
  • v.13.0.2: Implemented a chromosome format check. Description column in raw files is now optional.
  • v.13.0.1: Some changes have been made to the error messages which now should be slightly more helpful. Updated the manual to reflect changes in v.13.0.0.
  • v.13.0.0: A major upgrade. Instead trying to catch up with every version of FE file formats, columns are now loaded dynamically based on column headers. This dramatically expands the number of supported versions. Some of the helper functions have also been re-written to make better use of various C++ features.
  • v.12.0.0: Brings three main features: 1) Both 44k and 244k CGH array data may now be merged; 2) Fixed another severe bug (SIGSEGV on duplicate value check); 2) Implemented file format retrieval from header (finally) - so we are no longer dependant on properly named files. Also some minor internal clean up and user input modifications. Manual updated accordingly.
  • v.11.0.2: Fixed a severe bug which crashed the program on unmapped values in files formatted according to 4.10.apr08.
  • v.11.0.1: Code cleanup and commenting. Updated manual (contents and design).
  • v.11.0.0: Support for FE-format version 4.10.apr08.
  • v.10.1.2: From now on all FE-values (log_10) are converted to log_2 internally. Small changes to Makefile.
  • v.10.1.1: Fixed a small bug introduced in 10.1.0 which prevented Windows version from running at all; timer enabled for windows; linux script rewritten; manual changed accordingly
  • v.10.1.0: Implemented a workaround for a compiler bug in Windows (calling dtor deep within vector array before init on throw); startup script in Linux should now be interactive; manual changed accordingly; minor file open bug fixed; some code cleaning; few minor archive changes; minor Makefile changes
  • v.10.0.0: Ugly bug in dupe remover fixed; manual cache definition for windows users; huge code clean up; some annoying user input bugs removed
  • v.9.5.1: A different algorithm to merge duplicates has been implemented, should now be 400% or so faster.
  • v.9.0.0: Duplicate removal has been implemented
  • v.8.0.0: From now on, we are (almost) 100% cross-platform. After a few minor changes the abnormal termination is not happening like it did before with some STL and compiler versions.
  • v.7.5.4: A bug has been fixed when reusing io-streams for file r/w
  • v.7.5.3: Cleaned up the code, corrected some outdated comments, error throwing is now up to date
  • v.6.1.0: Caching implemented and working
  • v.5.0.0: Program should now be compatible with windows paths.
  • v.4.0.0: Multiple files can now be processed at the same time.
  • v.3.1.0: Many bug fixes, everything should now be working more or less as expected.
  • v.2.0.0: Some speed enhancements.
  • v.1.0.0: Initial release, processes one single file





Contact

If you have any questions regarding use of this program, want to report a bug or have some feedback, please email me.




Links