Molecular Biology
The complete guide to DNA structure and replication, the central dogma, transcription and translation, prokaryotic and eukaryotic gene regulation, recombinant DNA technology, CRISPR genome editing, PCR, gel electrophoresis, epigenetics, RNA biology, and the molecular mechanisms driving medicine, biotechnology, and our understanding of life itself.
In 1953, James Watson and Francis Crick published a 900-word paper in Nature proposing a double-helical structure for DNA that, as they famously understated, “suggests a possible copying mechanism for the genetic material.” That understatement concealed a revolution. The structure explained, in a single elegant molecular model, how genetic information could be faithfully copied (through complementary base pairing), how mutations could arise (through copying errors or chemical damage), and how information encoded in sequence could in principle specify the structure of proteins. In the seven decades since, molecular biology has moved from that first structural insight to sequencing entire genomes in hours, editing any gene in any organism with nucleotide precision, and developing molecular medicines that treat diseases previously untreatable. Understanding molecular biology — its foundational concepts, its experimental tools, and its clinical applications — is now not merely useful for scientists but essential context for medicine, public health, biotechnology policy, and informed citizenship in an era when the molecular basis of life shapes virtually every significant biological advance.
Molecular Biology — Scope, History, and the Questions That Define the Field
Molecular biology is the study of biological processes at the molecular level — primarily the structure, function, and interactions of the nucleic acids (DNA and RNA) and proteins that carry, express, and regulate genetic information. It is both a discipline in its own right and a foundation for virtually every area of modern biology and medicine. Microbiology, cell biology, genetics, developmental biology, neuroscience, pharmacology, and cancer biology all depend on molecular biological understanding to interpret their observations and design their experiments.
The field took shape in the 1940s and 1950s through a series of landmark discoveries that established what genetic information is and how it works: Avery, MacLeod, and McCarty’s 1944 demonstration that DNA (not protein) is the transforming principle in bacteria; Chargaff’s 1950 base composition rules (A = T, G = C in any DNA sample); Hershey and Chase’s 1952 phage experiment confirming DNA as the genetic material; and Watson and Crick’s 1953 double helix model. The following two decades established the central dogma, cracked the genetic code, characterised the ribosome, identified restriction enzymes, and produced the first recombinant DNA molecules — laying the technical foundation for the biotechnology industry and modern genomic medicine.
DNA Structure — The Double Helix, Base Pairing, and the Chemical Basis of Heredity
DNA (deoxyribonucleic acid) is the molecule that stores genetic information in all cellular life forms and most viruses. Its structure, elucidated by Watson and Crick in 1953 using X-ray diffraction data primarily from Rosalind Franklin and Maurice Wilkins, reveals a molecular architecture perfectly suited to its biological functions: faithful copying through semiconservative replication, long-term information storage in a chemically stable form, and encoding of information in sequence that can be read and regulated by proteins.
The Chemical Components of DNA
Each nucleotide monomer of DNA consists of three components: a deoxyribose sugar (lacking the 2′-OH of ribose in RNA), a phosphate group (connecting the 3′ carbon of one nucleotide to the 5′ carbon of the next via a phosphodiester bond), and one of four nitrogenous bases — the purines adenine (A) and guanine (G), and the pyrimidines thymine (T) and cytosine (C). The phosphodiester backbone is uniformly charged (negative) and forms the structural framework of each strand; the bases project inward and are responsible for sequence-specific interactions — both with the complementary strand and with regulatory proteins that read the DNA sequence.
B-FORM HELIX (predominant in cells): Diameter: ~2 nm (20 Å) Rise per base pair: 0.34 nm (3.4 Å) Helix pitch: 3.4 nm (one full turn = 10.5 bp) Major groove: ~2.2 nm wide — principal binding site for sequence-specific proteins Minor groove: ~1.2 nm wide — binding site for some drugs (netropsin, actinomycin D) Strand orientation: Antiparallel (one 5′→3′, complementary 3′→5′) WATSON-CRICK BASE PAIRS: A ··· T (adenine–thymine): 2 hydrogen bonds — AT/TA (weaker) G ··· C (guanine–cytosine): 3 hydrogen bonds — GC/CG (stronger) Chargaff rules: [A] = [T]; [G] = [C] in any dsDNA sample %GC content varies by organism: bacteria 25–75%; human genome ~41% Higher %GC → higher melting temperature (more H-bonds per bp) DNA TOPOLOGY: Relaxed circular DNA — no supercoiling tension Positively supercoiled — overwound (forms ahead of replication fork) Negatively supercoiled — underwound (predominant in cells; facilitates strand separation) Topoisomerase I — relaxes both positive and negative supercoils Topoisomerase II (Gyrase) — introduces negative supercoils (antibacterial target)
Chromatin — Packaging DNA in the Eukaryotic Nucleus
A human cell must fit approximately 2 metres of linear DNA into a nucleus ~6 μm in diameter — a packaging challenge that requires ~10,000-fold compaction. This is achieved through successive levels of DNA organisation, beginning with the nucleosome: approximately 147 bp of DNA wrapped around an octamer of histone proteins (two each of H2A, H2B, H3, and H4) to form the ~11-nm “beads-on-a-string” nucleosomal array. Linker histone H1 associates with the linker DNA between nucleosomes, facilitating further folding into 30-nm fibres and higher-order chromatin domains. At the level of the nucleus, DNA is organised into topologically associating domains (TADs) — megabase-scale chromatin compartments that segregate active (A compartment) and inactive (B compartment) chromatin regions. Condensed, transcriptionally silent chromatin is called heterochromatin; open, transcriptionally accessible chromatin is euchromatin. Nucleosome positioning and histone modification state critically regulate gene expression by controlling transcription factor and RNA polymerase access to DNA — the molecular basis of epigenetic regulation.
DNA Replication — Semiconservative Copying, the Replisome, and Maintaining Fidelity
Watson and Crick’s 1953 paper noted that the complementary base-pairing of the double helix “immediately suggests a possible copying mechanism” — if the two strands separate, each could serve as a template for synthesis of a new complementary strand, producing two identical daughter duplexes. This semiconservative replication model was confirmed by Meselson and Stahl’s elegant 1958 experiment using density-labelled nitrogen isotopes, which showed that each daughter DNA molecule retains one parental strand and contains one newly synthesised strand. The molecular machinery that executes this copying — the replisome — is one of the most sophisticated biological machines known.
Origins of Replication
Replication begins at defined chromosomal sequences called origins of replication. The simple bacterium E. coli has a single origin (oriC) on its circular chromosome, where the DnaA initiator protein binds, melts the duplex, and recruits the replicative helicase (DnaB). The human genome has approximately 30,000–50,000 origins — necessary because eukaryotic chromosomes are too large to replicate from a single origin within the time constraints of S phase. Origins are licensed for replication by loading the MCM2-7 helicase complex during G1; firing of licensed origins is triggered during S phase by CDK2-cyclin E/A and DDK (Dbf4-dependent kinase) activities. Each origin, once fired, replicates bidirectionally — two replication forks travel in opposite directions from each origin.
Replication Fidelity — Three Levels
The remarkable accuracy of DNA replication (~1 error per 10⁹–10¹⁰ base pairs) is achieved through three sequential mechanisms. First, nucleotide selection: the induced-fit conformational change of the polymerase fingers domain preferentially incorporates correctly base-paired dNTPs (~1 error in 10⁵). Second, 3’→5′ proofreading exonuclease activity of the replicative polymerase: removes incorrectly incorporated nucleotides before the next addition (~100-fold improvement). Third, mismatch repair (MMR): post-replicative scanning by MutS/MutL proteins identifies and corrects residual mismatches (~100-fold further improvement). Together these three mechanisms achieve an overall error rate ~10 billion-fold lower than the spontaneous error rate of uncatalysed nucleotide addition.
In 1958, Matthew Meselson and Franklin Stahl grew E. coli in medium containing heavy nitrogen (¹⁵N) until all cellular DNA was fully labelled with ¹⁵N. They then transferred cells to normal (¹⁴N) medium and allowed one, two, and three rounds of replication. After centrifuging DNA in a caesium chloride density gradient (which separates DNA by buoyant density), the results were unambiguous: after one replication, all DNA migrated at an intermediate density (one heavy strand, one light strand — confirming semiconservative copying). After two replications, DNA appeared at two positions — intermediate and light — in a 1:1 ratio, consistent with only semiconservative replication. The experiment excluded conservative replication (which would show only heavy and light bands after one round) and dispersive replication (which would show a single intermediate band after both rounds that shifted lighter with each successive division).
The Meselson-Stahl experiment is frequently cited as “the most beautiful experiment in biology” — it used an elegant physical technique (density gradient centrifugation) to distinguish between three mechanistically distinct models with a single experiment, producing results that were definitive, visually clear, and immediately interpretable.
The Central Dogma — Information Flow from DNA to RNA to Protein
Francis Crick coined the term “central dogma” in 1958 to describe what was then a hypothesis about the directionality of biological information transfer. In its most general form, it states: information stored in nucleic acid sequences can be transferred to other nucleic acid sequences or to protein sequences, but information in protein sequences cannot be transferred back to nucleic acids. The specific transfers that normally occur in cells are: DNA replication (DNA → DNA), transcription (DNA → RNA), and translation (RNA → protein). The transfer RNA → DNA (reverse transcription by retroviral reverse transcriptase) is a known exception. Protein → nucleic acid and protein → protein transfers have never been demonstrated for sequence information under normal cellular conditions (though prion propagation involves protein-directed protein conformational change without sequence transfer).
DNA Replication — Faithful Copying of the Genome
Before cell division, the entire genome must be duplicated. DNA polymerase uses each parental strand as a template to synthesise a complementary daughter strand, producing two identical daughter duplexes. The fidelity of replication is approximately 1 error per 10⁹–10¹⁰ nucleotides — essential for maintaining genome integrity across trillions of cell divisions in a human lifetime. Replication occurs during S phase of the cell cycle and is tightly coupled to cell cycle checkpoints that verify completion before division.
Transcription — Converting DNA Information into RNA
RNA polymerase reads the template (antisense) DNA strand 3’→5′ and synthesises a complementary RNA strand 5’→3′. In eukaryotes, three RNA polymerases divide transcription: Pol I transcribes ribosomal RNA genes; Pol II transcribes protein-coding genes (mRNA) and most non-coding RNAs; Pol III transcribes tRNA, 5S rRNA, and small non-coding RNAs. Transcription is the primary control point for gene expression — the rate of transcription initiation determines how much of any given mRNA is present in the cell, and therefore how much of the encoded protein can be made.
RNA Processing — Preparing mRNA for Translation
In eukaryotes, the primary transcript (pre-mRNA) requires extensive processing before translation: 5′ capping (addition of a 7-methylguanosine cap that protects against degradation and aids ribosome recruitment), 3′ polyadenylation (addition of a ~200 nucleotide poly-A tail that aids export and stability), and splicing (removal of introns and joining of exons by the spliceosome). Alternative splicing of the same pre-mRNA can produce multiple different protein isoforms from a single gene — greatly expanding proteome diversity beyond the ~20,000 protein-coding genes in the human genome.
Translation — Decoding mRNA Sequence into Protein
The ribosome reads the mRNA sequence in triplet codons (5’→3′) and synthesises the encoded polypeptide chain. Each codon specifies a particular amino acid (or a start/stop signal) via the universal genetic code. Aminoacyl-tRNAs — tRNA molecules covalently linked to their cognate amino acid by aminoacyl-tRNA synthetases — serve as adaptors, presenting the correct amino acid to the ribosome’s A-site when the anticodon of the tRNA matches the codon of the mRNA. After synthesis, the polypeptide is folded (assisted by chaperone proteins), post-translationally modified, and targeted to its correct cellular location.
Transcription — Initiating, Elongating, and Terminating RNA Synthesis
Transcription is the synthesis of an RNA molecule using a DNA template, catalysed by RNA polymerase (RNAP). Unlike DNA polymerase, RNA polymerase can initiate RNA synthesis de novo — it does not require a primer with a free 3′-OH — because the energy requirement for the first phosphodiester bond is offset by the release of pyrophosphate from the initiating NTP, which is hydrolysed by cellular pyrophosphatase. RNA polymerase also has lower fidelity than DNA polymerase (~1 error per 10⁵ nucleotides) and lacks proofreading — acceptable because RNA is a transient product rather than a heritable record.
Prokaryotic Transcription
In bacteria, a single RNA polymerase (core enzyme: α₂ββ’ω) is responsible for transcribing all RNA species. The sigma (σ) factor confers promoter recognition specificity — different sigma factors direct the core RNAP to different promoter classes, allowing the cell to shift gene expression programmes in response to stress (σ⁷⁰ for exponential growth; σ³² for heat shock; σ⁵⁴ for nitrogen limitation). Prokaryotic promoters have two conserved sequence elements: the –10 element (consensus TATAAT) and –35 element (consensus TTGACA) upstream of the transcription start site. σ factor contacts these elements and positions RNAP, then is released after initiation. Termination occurs either by Rho-independent (intrinsic) termination — a GC-rich hairpin followed by a run of U residues causes RNAP to pause and dissociate — or by Rho-dependent termination, where the Rho helicase tracks the mRNA and dislodges RNAP at pause sites.
RNA Processing — Splicing, the Spliceosome, and Expanding the Proteome
The discovery of introns in 1977 by Richard Roberts and Phillip Sharp (Nobel Prize 1993) revealed that eukaryotic genes are discontinuous: protein-coding sequences (exons) are interrupted by non-coding sequences (introns) that must be precisely removed from the pre-mRNA before translation. This process — RNA splicing — is catalysed by the spliceosome, one of the most complex macromolecular machines in the cell, and it fundamentally changes the relationship between gene number and protein diversity.
The Spliceosome — Five snRNPs, One Machine
The spliceosome is assembled from five small nuclear ribonucleoprotein complexes (snRNPs) — U1, U2, U4, U5, and U6 — each containing a snRNA and associated proteins. Assembly is sequential: U1 snRNP recognises the 5′ splice site; U2 snRNP binds the branch point adenosine (typically ~20–50 nt upstream of the 3′ splice site); U4/U6 and U5 join to form the active spliceosome. Two transesterification reactions (catalysed by the RNA component — the spliceosome is a ribozyme) remove the intron as a lariat structure and join the flanking exons. The entire spliceosome disassembles and recycles after each splicing event.
Alternative Splicing — One Gene, Many Proteins
A single pre-mRNA can be spliced in different ways in different cell types, developmental stages, or in response to signals — producing multiple mRNA isoforms that encode different protein variants. This alternative splicing exponentially expands the proteome from a fixed gene number: it is estimated that >90% of human multi-exon genes undergo alternative splicing. The neurexin gene family exemplifies extreme alternative splicing — neurexins can generate thousands of protein isoforms from three genes through a combination of alternative promoters, alternative exon inclusion, and alternative 3′ splice site use. Alternative splicing is regulated by splicing regulators (SR proteins promoting exon inclusion; hnRNP proteins promoting exclusion) that bind exonic and intronic splicing enhancers and silencers.
The 5′ Cap — Translation Initiation and mRNA Stability
Within seconds of transcription initiation, the 5′ end of the nascent RNA is modified by addition of a 7-methylguanosine cap via an unusual 5’–5′ triphosphate linkage. The cap is added co-transcriptionally when the transcript is approximately 20–30 nucleotides long, by capping enzyme recruited by the phosphorylated CTD of RNA Pol II. The cap serves multiple functions: it protects the mRNA from 5’→3′ exonuclease degradation, is recognised by the cap-binding complex (CBC) for nuclear export, is recognised by eIF4E for ribosome recruitment during translation initiation, and marks the mRNA as a legitimate cellular transcript (distinguishing it from viral or aberrant RNAs).
Polyadenylation — Adding the Poly-A Tail
After cleavage of the pre-mRNA at a polyadenylation signal (typically the hexanucleotide AAUAAA ~10–30 nt upstream of the cleavage site), poly-A polymerase adds approximately 150–250 adenosine residues to the 3′ end without a template. The poly-A tail binds poly-A binding protein (PABP), which protects the mRNA from 3’→5′ degradation, stimulates translation by circularising the mRNA through eIF4G-PABP interaction (promoting ribosome recycling), and facilitates nuclear export. Alternative polyadenylation — selection of different cleavage and polyadenylation signals in a pre-mRNA — produces mRNA isoforms with different 3′ untranslated regions (UTRs) that differ in stability, translation efficiency, and microRNA responsiveness.
miRNA, lncRNA, and the Non-Coding Transcriptome
The majority of the human genome is transcribed, but the majority of transcripts are not translated. MicroRNAs (miRNAs, ~22 nt) are processed from hairpin precursors by Drosha and Dicer, then loaded into the RISC complex where they guide sequence-specific binding to 3′ UTRs of target mRNAs, causing translational repression or mRNA degradation. Each miRNA can regulate hundreds of targets; each mRNA has multiple miRNA binding sites — creating a complex regulatory network. Long non-coding RNAs (lncRNAs, >200 nt) regulate gene expression through diverse mechanisms: scaffolding chromatin-remodelling complexes (XIST in X-chromosome inactivation), enhancer RNAs, and competing endogenous RNAs. The regulatory capacity of the non-coding transcriptome is now understood to be extensive and essential to normal development and physiology.
Nonsense-Mediated Decay — Quality Control for mRNA
Nonsense-mediated mRNA decay (NMD) is a cellular quality control pathway that detects and degrades mRNAs containing premature termination codons (PTCs) — preventing the translation of potentially dominant-negative truncated proteins. NMD depends on the exon junction complex (EJC) deposited on mRNA at exon-exon junctions during splicing: a ribosome encountering a PTC more than ~50–55 nt upstream of an EJC triggers NMD, activating UPF1/2/3-mediated mRNA decapping and degradation. NMD is important for disease understanding: many disease-causing nonsense mutations are subject to NMD — the severity of some genetic diseases (e.g., cystic fibrosis, Duchenne muscular dystrophy) depends partly on whether the mutant transcript escapes or is eliminated by NMD.
Translation — The Ribosome, the Genetic Code, and Protein Synthesis
Translation is the decoding of the mRNA nucleotide sequence into the amino acid sequence of a polypeptide. It is the most energy-intensive process in the cell — a rapidly growing bacterial cell devotes approximately 80% of its total biosynthetic capacity to ribosome production and protein synthesis. The ribosome is the molecular machine at the centre of translation: a two-subunit ribonucleoprotein complex (small subunit + large subunit) that reads the mRNA in triplet codons, recruits aminoacyl-tRNAs carrying the appropriate amino acid, and catalyses peptide bond formation between successive amino acids through peptidyl transferase activity — which, like the spliceosome, is RNA-catalysed (the peptidyl transferase activity resides in the large subunit ribosomal RNA, making the ribosome a ribozyme).
The Genetic Code — 64 Codons, 20 Amino Acids, and Degeneracy
The genetic code maps all 64 possible triplet codons to either one of the 20 standard amino acids or to a stop signal. Because there are 64 codons but only 20 amino acids, the code is degenerate — multiple codons (synonymous codons) specify the same amino acid. Most amino acids are encoded by 2–4 codons; arginine, leucine, and serine each have 6 codons. Only methionine (AUG, also the start codon) and tryptophan (UGG) are encoded by a single codon. Three codons (UAA, UAG, UGA) do not specify amino acids but signal termination of translation.
The code is read in a continuous, non-overlapping, non-punctuated series of triplets from the AUG start codon. The reading frame — which triplet grouping is used — is set by the initiator AUG and maintained by the ribosome throughout elongation. Frameshift mutations (insertions or deletions of non-multiples of 3) shift the reading frame and change the identity of all downstream codons, typically producing a premature stop codon — a truncated, non-functional protein.
The genetic code is nearly universal — the same codon table applies from bacteria to humans — with only minor variations in mitochondria and some protists. This universality both confirms the common ancestry of all life and enables the expression of genes from one organism in another (heterologous expression) — the basis of recombinant protein production and gene therapy.
Prokaryotic Gene Regulation — Operons, Repressors, and Metabolic Responsiveness
Gene regulation allows cells to adjust protein synthesis in response to changing environmental conditions — producing metabolic enzymes only when their substrates are available, and repressing energetically costly biosynthetic pathways when their end-products are abundant. In prokaryotes, regulation at the transcriptional level is achieved primarily through operons — clusters of functionally related genes transcribed as a single polycistronic mRNA unit, controlled by shared regulatory sequences including the promoter and operator.
LAC OPERON STRUCTURE: lacI — repressor gene (constitutively expressed) P — promoter (RNA Pol binding site) CAP site — upstream activator sequence O — operator (repressor binding site, overlaps P) lacZ — β-galactosidase (cleaves lactose → glucose + galactose) lacY — permease (lactose import) lacA — transacetylase (acetylates toxic galactosides) REGULATION — FOUR CONDITIONS: Glucose present, Lactose absent: OFF Repressor bound to operator → RNAP blocked High cAMP (low glucose) never reached → CAP inactive Glucose present, Lactose present: Weak ON Allolactose binds repressor → repressor released from operator But high glucose keeps cAMP low → CAP inactive → low transcription Glucose absent, Lactose absent: OFF Repressor bound to operator → no transcription despite CAP being active Glucose absent, Lactose present: MAXIMUM ON No glucose → adenylyl cyclase active → high cAMP → CAP-cAMP binds CAP site Allolactose → repressor released → operator free CAP-cAMP recruits RNAP to promoter → ~50× stimulation of transcription Cell uses available lactose as carbon source efficiently
Other important prokaryotic regulatory mechanisms include: the trp operon, regulated by a repressor that is activated (not inactivated) by its end-product tryptophan — a biosynthetic operon under end-product repression, the opposite of inducible catabolic operons; attenuation, a transcription termination mechanism in which the secondary structure formed by the nascent leader RNA depends on translational coupling and amino acid availability, fine-tuning the termination decision before the structural genes are reached; and riboswitches, RNA elements in the 5′ UTR of mRNAs that directly bind small molecule metabolites, causing conformational changes that alter transcription termination or translation initiation — a protein-independent regulatory mechanism found in bacteria and some eukaryotic organelles.
Eukaryotic Gene Regulation — Transcription Factors, Enhancers, and Chromatin Remodelling
Eukaryotic gene regulation is vastly more complex than prokaryotic regulation, reflecting the greater genomic complexity, the separation of transcription and translation by the nuclear envelope, and the requirements of multicellular development — where thousands of distinct cell types must each express specific subsets of the 20,000+ genes in the genome. Regulation occurs at every step from chromatin structure to post-translational modification, but transcriptional regulation — controlling the rate of RNA Pol II initiation at gene promoters — remains the primary and best-understood level.
Transcription Factors — Sequence-Specific Regulators
Sequence-specific transcription factors (TFs) bind defined DNA sequences through structural domains — zinc fingers, helix-turn-helix, basic leucine zipper (bZIP), basic helix-loop-helix (bHLH) — and activate or repress transcription by recruiting coactivator/corepressor complexes. The human genome encodes approximately 1,600 TFs. Activators recruit the Mediator co-activator complex that bridges TFs to RNA Pol II; they also recruit histone acetyltransferases (HATs) that acetylate histone lysines, relaxing chromatin. Repressors recruit histone deacetylases (HDACs) and histone methyltransferases (HMTs) that compact chromatin and reduce accessibility. TF combinatorial binding — multiple TFs binding to an enhancer — generates the cell-type specificity of gene expression: each TF has broad genomic binding but its combination with cell-type-specific TF partners determines which genes are actually activated.
Enhancers — Remote Control of Transcription
Enhancers are cis-regulatory DNA elements that activate transcription independent of their distance (up to 1 Mb) and orientation relative to the target gene. They function by looping the chromatin to bring bound activators into contact with the promoter-associated preinitiation complex — a process mediated by the Mediator complex and cohesin-facilitated chromatin loops within topologically associating domains (TADs). Enhancers are marked by histone modifications (H3K4me1, H3K27ac), bidirectional transcription (producing enhancer RNAs), and accessible chromatin (detectable by ATAC-seq and DNase-seq). Cell-type specificity of enhancer activity — different TFs binding the same enhancer in different cell types — is a primary mechanism generating cell-type-specific gene expression programmes.
Chromatin Remodelling — Controlling DNA Accessibility
Chromatin remodelling complexes — including SWI/SNF (BAF), ISWI, NuRD, and INO80 families — use ATP hydrolysis to reposition, evict, or restructure nucleosomes, controlling access to underlying DNA sequences. SWI/SNF complexes (mutated in ~20% of human cancers) slide or evict nucleosomes to create accessible chromatin at promoters and enhancers. ISWI complexes typically space nucleosomes to generate regular arrays associated with repressed chromatin. Pioneer transcription factors — a special class of TFs — can bind nucleosomal DNA and recruit remodelling complexes to establish new accessible regions, enabling cell fate transitions and reprogramming. The chromatin accessibility landscape of a cell — its “chromatin state” — determines which genes are available for activation and which are stably silenced.
Post-Transcriptional Regulation
Gene expression is regulated not only at the transcriptional level but also at RNA processing, nuclear export, mRNA stability, and translational efficiency. RNA-binding proteins (RBPs) bind specific sequences in the 5′ UTR, 3′ UTR, or coding sequence of mRNAs, regulating splicing, polyadenylation site choice, mRNA localisation, stability, and translation rate. The iron response element (IRE) / iron-regulatory protein (IRP) system is a paradigmatic example: in low-iron conditions, IRP binds IRE hairpins in the ferritin mRNA 5′ UTR to repress translation, and in the transferrin receptor mRNA 3′ UTR to stabilise the mRNA — coordinating iron uptake and storage from a single post-transcriptional regulatory mechanism without changing transcription.
Phase Separation and Transcriptional Condensates
Recent discoveries have revealed that gene regulation involves liquid-liquid phase separation — the formation of condensate droplets in the nucleus through the concentration of intrinsically disordered regions (IDRs) of TFs and co-activators. Transcriptional condensates at super-enhancers concentrate RNA Pol II, Mediator, and activating TFs, potentially creating high local concentrations that drive burst-like transcriptional activity. Heterochromatin protein 1 (HP1) forms phase-separated condensates at constitutive heterochromatin, contributing to stable silencing. Phase separation may explain how enhancers communicate over long distances in the nucleus — through condensate-mediated concentration of regulatory factors — though the precise relationship between condensates and transcriptional control is still being resolved.
Epigenetics — DNA Methylation, Histone Modifications, and Heritable Gene Regulation
Epigenetics refers to heritable changes in gene expression that do not involve changes in the DNA sequence — instead, they involve covalent modifications of DNA or histones, or non-covalent changes in chromatin structure, that alter transcriptional activity and are transmitted through cell division. The discovery that identical genomes can produce vastly different cell types (a hepatocyte and a neuron share the same DNA but express very different gene sets) established that epigenetic regulation is essential to differentiation and development. The finding that some epigenetic states are transmitted across generations — transgenerational epigenetic inheritance — has profound implications for our understanding of inheritance and the relationship between environment and phenotype.
Key epigenetic marks — associations with transcriptional state
DNA methylation at CpG dinucleotides is catalysed by DNA methyltransferases (DNMT3A and DNMT3B establish de novo methylation; DNMT1 maintains methylation patterns during replication by copying parental strand methylation to newly synthesised strands). Methylation of CpG islands at gene promoters is associated with stable transcriptional silencing — used in X-chromosome inactivation (where the inactive X chromosome is comprehensively CpG methylated), genomic imprinting (where gene expression from one parental allele is silenced by methylation), and cancer (where aberrant CpG island hypermethylation silences tumour suppressor genes). Active DNA demethylation is mediated by the TET enzyme family, which converts 5-methylcytosine to 5-hydroxymethylcytosine and further oxidised forms that are removed by base excision repair.
Recombinant DNA Technology — Restriction Enzymes, Cloning, and Heterologous Expression
Recombinant DNA technology — the set of methods for cutting, joining, copying, and introducing DNA molecules between organisms — transformed biology after 1973 when Herbert Boyer and Stanley Cohen demonstrated that a gene from a toad could be expressed in bacterial cells from a recombinant plasmid. The intellectual tools were available: restriction enzymes (discovered by Werner Arber, Hamilton Smith, and Daniel Nathans — Nobel Prize 1978) to cut DNA at specific sequences; DNA ligase to join DNA fragments; plasmid vectors to replicate foreign DNA in bacteria; and transformation to introduce DNA into cells. The combination created the foundation of modern biotechnology, genetic medicine, and much of contemporary research.
Restriction Enzymes
Type II restriction endonucleases cut dsDNA at specific palindromic recognition sequences (4–8 bp), generating blunt or sticky ends. EcoRI (cuts G↓AATTC), HindIII, BamHI, and hundreds of others provide a molecular toolkit. Methylation by cognate methyltransferases protects bacterial DNA from self-cleavage — the restriction-modification system is bacterial innate immunity against phage DNA.
Molecular Cloning
A target DNA fragment is cut with restriction enzymes compatible with the vector’s multiple cloning site (MCS), ligated into the vector with T4 DNA ligase, transformed into competent bacteria, and selected on antibiotic plates. Blue-white selection (lacZ α-complementation) or antibiotic resistance identifies recombinant colonies. Modern alternatives include Gibson assembly (exonuclease-mediated overlap joining), Golden Gate (BsaI-based modular assembly), and TOPO cloning.
Heterologous Protein Expression
Cloned genes can be expressed in bacterial (E. coli), yeast, insect (baculovirus/Sf9 cells), mammalian (CHO, HEK293), or cell-free systems to produce recombinant proteins for research, diagnostics, and therapeutics. Insulin (1982), human growth hormone, erythropoietin, and monoclonal antibodies are among the recombinant proteins now produced at industrial scale — enabled entirely by recombinant DNA technology.
PCR, DNA Sequencing, and Gel Electrophoresis — The Core Analytical Toolkit
Three techniques form the experimental backbone of virtually all molecular biology work: PCR for amplifying specific sequences from complex genomic backgrounds; DNA sequencing for reading the nucleotide sequence of amplified products or entire genomes; and gel electrophoresis for separating and visualising DNA, RNA, and protein molecules by size. Together, these methods enable the identification of mutations, the verification of cloned constructs, the expression analysis of genes, the fingerprinting of organisms, and the diagnosis of infectious diseases — all from minute biological samples.
CRISPR-Cas9 Genome Editing — From Bacterial Immunity to Precision Medicine
CRISPR-Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats — CRISPR-associated protein 9) is the bacterial adaptive immune system that has been repurposed as the most transformative tool in molecular biology since PCR. Discovered as a bacterial defence mechanism against phage infection (CRISPR arrays store fragments of phage DNA as an immunological memory; when reinfected, Cas9 guided by crRNA cleaves the matching phage DNA), its repurposing as a programmable genome editing tool was recognised with the Nobel Prize in Chemistry in 2020 awarded to Jennifer Doudna and Emmanuelle Charpentier.
Guide RNA target
Length of the spacer sequence in the sgRNA that base-pairs with the genomic target — defining the specificity of Cas9 cleavage
PAM sequence
Protospacer adjacent motif — required immediately 3′ of the target sequence; Cas9 from S. pyogenes requires NGG (approximately every 8 bp in the human genome)
Cut site
Location of Cas9 blunt-end double-strand break — 3 bp upstream of the PAM sequence, within the 20-nt target region
Nobel Prize
Chemistry Nobel awarded to Jennifer Doudna and Emmanuelle Charpentier for developing CRISPR-Cas9 as a genome editing tool
CRISPR Applications — From Basic Research to Clinical Trials
Gene Knockout — Loss-of-Function Studies
The simplest CRISPR application: targeting a sgRNA to a coding exon creates an indel via NHEJ repair that frameshifts or truncates the protein — a functional knockout. This replaced the labour-intensive homologous recombination targeting used to create knockout mouse models, reducing timescales from years to weeks. Genome-wide CRISPR screens using pooled sgRNA libraries have mapped the genetic dependencies of cancer cells, identified essential genes, and uncovered drug resistance mechanisms at scale.
Base Editing — Single Nucleotide Changes Without Double-Strand Breaks
Base editors — developed by David Liu’s group — fuse a catalytically impaired Cas9 (nickase) to a deaminase enzyme, enabling conversion of one DNA base to another at the target site without creating a double-strand break. Cytosine base editors (CBEs) convert C→T; adenine base editors (ABEs) convert A→G. Because most single-gene disease-causing point mutations are C→T or G→A transitions (which ABEs can correct as A→G), base editors have broad therapeutic potential. Clinical trials using ex vivo base editing of haematopoietic stem cells are ongoing for sickle cell disease and β-thalassaemia.
Prime Editing — Writing Any Change into the Genome
Prime editing (PE), also from David Liu’s group, uses a pegRNA (prime editing guide RNA) that contains both the target-matching spacer sequence and the desired edit sequence, combined with a Cas9-nickase fused to reverse transcriptase. After the pegRNA-directed nick, the reverse transcriptase uses the 3′ extension of the pegRNA as a template to copy the desired edit into the nicked strand — capable of introducing any point mutation, small insertion, or small deletion without double-strand breaks or donor templates. Prime editing substantially expands the range of editable mutations, addressing disease-causing variants that base editing cannot correct.
Clinical Applications — CRISPR in Human Trials
The first approved CRISPR-based therapeutic, Casgevy (exagamglogene autotemcel), received FDA and EMA approval in late 2023 for sickle cell disease and transfusion-dependent β-thalassaemia. Casgevy uses ex vivo CRISPR editing of patients’ haematopoietic stem cells to reactivate fetal haemoglobin by disrupting the BCL11A enhancer — restoring functional haemoglobin production and eliminating disease symptoms. Multiple additional CRISPR therapies are in clinical trials for transthyretin amyloidosis (in vivo liver editing using lipid nanoparticles), acute myeloid leukaemia, metastatic cancers (CRISPR-engineered T cells), and Leber congenital amaurosis (in vivo retinal editing). The field of CRISPR therapeutics is advancing rapidly following the first successful regulatory approvals.
Molecular Medicine — Genomics, Gene Therapy, and the Clinical Applications of Molecular Biology
The translation of molecular biology knowledge into clinical medicine has produced some of the most significant advances in healthcare of the past three decades — from the molecular diagnosis of inherited diseases and the sequencing of tumour genomes to the development of targeted therapies, RNA-based vaccines, and gene therapies that address the root causes of genetic disease at the molecular level. Understanding the molecular basis of disease and the molecular mechanisms of its treatment is now a core requirement for medical education and clinical practice.
Molecular Biology Across Academic Curricula
Molecular biology features at every level of biology, biochemistry, biomedical science, medicine, pharmacy, and nursing curricula. Introductory courses cover DNA structure, the central dogma, PCR, and gel electrophoresis. Intermediate courses address gene regulation, RNA processing, recombinant DNA technology, and sequencing methods. Advanced and graduate-level courses engage with epigenomics, single-cell genomics, CRISPR applications, RNA therapeutics, and the molecular basis of cancer and inherited disease. Medical curricula integrate molecular biology through genetics, pharmacogenomics, cancer biology, and infectious disease.
For students needing support with molecular biology assignments, laboratory reports, research papers, or dissertations — from introductory DNA structure essays to advanced chromatin regulation analyses — our biology assignment help and biology research paper service provide specialist support. For pharmacology and medical applications of molecular biology, our nursing assignment help and custom science writing cover clinical molecular biology topics at all degree levels.
Molecular Biology, Biochemistry, and Genetics Academic Support
From DNA structure essays and PCR lab reports to full dissertations in CRISPR technology, epigenomics, and molecular medicine — specialist molecular biology writers available across all degree levels.
Frequently Asked Questions About Molecular Biology
Explore further support: biology assignment help · biology research papers · chemistry homework help · custom science writing · nursing assignment help · literature review writing · research paper writing · dissertation support · lab report writing · biostatistics help · data analysis help · critical analysis papers · proofreading and editing · citation and referencing · challenging research topics · statistics assignment help