Genetic Sequence Modelling

Overview

This project involved building a C++ program to model biological sequence processing by transcribing DNA into RNA and translating RNA into amino acid sequences. The goal was to better understand both the biological process of gene expression and the software design challenges involved in efficiently handling sequence data.

Background

DNA transcription and translation are foundational processes in molecular biology and are central to many bioinformatics pipelines. Modeling these processes computationally provides insight into how biological information is represented, transformed, and analyzed using software.

Approach

I implemented a C++ program that reads DNA sequences from files using the standard fstream library and performs transcription and translation based on codon mappings. The initial implementation used straightforward string processing, and I am currently optimizing the translation step by incorporating hash maps to improve lookup efficiency for amino acid conversion.

fstream
C++

Results

The program successfully converts DNA input sequences into corresponding RNA transcripts and amino acid chains. Through this project, I gained hands-on experience with file I/O, string manipulation, and performance considerations in low-level languages like C++.

Limitations & Future Work

The current implementation assumes ideal input and does not yet handle sequencing errors or ambiguous nucleotides. Future improvements include optimizing performance further, adding validation and error handling, and extending the program to support larger genomic datasets.

Key Takeaways

Strengthened understanding of gene expression and sequence biology
Applied C++ file handling and data structures to biological problems
Learned how algorithmic choices affect performance in bioinformatics tools