Internships / Stages /
Afstudeeropdrachten
In collaboration with the Microarray
Facility VU University Medical Center,
Amsterdam and Statistics for Life
Sciences, Amsterdam
we have a number of exciting 3 or 6 month internship projects in bioinformatics and / or bio-statistics.
In the past, we have had several Dutch and international students
(England, Spain, Austria) who very much enjoyed their projects and our
active research environment. Most of them started excellent jobs after
their internships either in industry or as PhD-student.
Please do not hesistate to contact me when you're interested: mark.vdwiel[at]vumc.nl
Topics: meta-clustering of
tumor array CGH, statistical quality control for microarrays (contol
charts), application of extreme value theory to detect aberrations in
the human DNA, multiple testing, etc.
A sample of some of the projects
1. Identification of Colorectal Cancer Related Expression Signatures
Aim: Colorectal cancer (CRC) is
the second leading cause of cancer death in the Netherlands. In 2003, 4429
patients died from CRC and by 2015 this number is expected to increase to 5300.
It is generally accepted that secondary prevention (early
detection of pre-malignant lesions) is the most realistic approach for reducing
this high number of colorectal cancer deaths. We aim to identify expression patterns
from blood that; 1) predict who is at high risk to develop CRC (‘pre-test
likelihood’); and 2) identify people who developed CRC.
Plan of Work: We obtained expression profiles of intestines and blood from different
mouse inbred strains as well as of human blood from CRC patients and controls,
using Agilent expression arrays. Comparisons have to be made between; 1) colon-derived
expression profiles from different mouse inbred strains; 2) mouse colon and
mouse blood samples; 3) mouse blood samples and human blood samples; and 4)
beteen human blood samples from CRC patients and controls. These analyses
should reveal expression signatures that reflect variation in genetic
susceptibility to CRC (mouse/human), and presence of (metastatic) CRC cells in
the blood circulation (human).
2. Implementation of methods that select classifiers from microarray
profiles to predict treatment outcome of cancer patients
In classification
analysis (also called discriminant analysis or supervised learning) a
classifier is constructed. A classifier assigns objects to pre-specified and
unordered classes on the basis of measurements made on these objects. For
instance in Golub et al. a classifier is built based on microarray expression
profiles. The classifier distinguishes Acute Lymphoid Leukemia from Acute
Myloid leukemia, and it does this with 100% accuracy (1). Chromosomal copy numbers can also be used to identify markers
that classify tumors using a microarray technique called “arrayCGH”(1-2). A combination of preprocessing procedures (3) and statistical testing
(4) identified potential markers for different head and neck
tumors (5).
These markers are,
however, identified on an independent basis, while we wish to combine these to
construct a good classifier. Such a classifier should predict clinical outcome
with array profiles well. The candidate will explore and test various
implemented classification tools such as PAM. The classification techniques are
to be applied to datasets from our own laboratory.
3.Detection of potential genomic
markers: permutation tests for array CGH analysis
After
pre-processing, array CGH basically results in chromosomal copy number counts
for thousands of chromosomal locations (clones). Since we have two copies of
each chromosome, two copies is normal, but at some locations gains (> 2
copies) or losses (< 2 copies) may occur. These locations are of special
interest. Often, biologists are interested in locations that show different
proportions of gains (or losses) between two or more groups of samples, where
the groups correspond to a clinical variable, eg. tumor status. Naturally,
fluctuations between these proportions always occur, so one needs statistical
tests to decide which differences are
probably real.
Statistical
permutation tests are widely available for various situations, but the
multiplicity of the data hampers an efficient implementation. The necessary
multiple testing corrections imply that very small p-values need to be computed
accurately. We showed how to do so for a two-group setting and applied the
program to head-and-neck tumor data . You will generalize the program to an application
which may be used in several situations (paired data, more than two groups,
contingency tables, etc.). The application should be reasonably efficient and
user-friendly containing a user-interface and visualisations. You will test the application
for several in-house data sets such as gastic tumor data and head and neck
tumor data.