Internships / Stages / Afstudeeropdrachten

In collaboration with the
Microarray Facility VU University Medical Center,  Amsterdam and Statistics for Life Sciences, Amsterdam
we have a number of exciting 3 or 6 month internship projects in bioinformatics and / or bio-statistics.

In the past, we have had several Dutch and international students (England, Spain, Austria) who very much enjoyed their projects and our active research environment. Most of them started excellent jobs after their internships either in industry or as PhD-student.

Please do not hesistate to contact me when you're interested: mark.vdwiel[at]vumc.nl

Topics: meta-clustering of tumor array CGH, statistical quality control for microarrays (contol charts), application of extreme value theory to detect aberrations in the human DNA, multiple testing, etc.

A sample of some of the projects

1. Identification of Colorectal Cancer Related Expression Signatures
Aim:
Colorectal cancer (CRC) is the second leading cause of cancer death in the Netherlands. In 2003, 4429 patients died from CRC and by 2015 this number is expected to increase to 5300. It is generally accepted that secondary prevention (early detection of pre-malignant lesions) is the most realistic approach for reducing this high number of colorectal cancer deaths. We aim to identify expression patterns from blood that; 1) predict who is at high risk to develop CRC (‘pre-test likelihood’); and 2) identify people who developed CRC.
Plan of Work:
We obtained expression profiles of intestines and blood from different mouse inbred strains as well as of human blood from CRC patients and controls, using Agilent expression arrays. Comparisons have to be made between; 1) colon-derived expression profiles from different mouse inbred strains; 2) mouse colon and mouse blood samples; 3) mouse blood samples and human blood samples; and 4) beteen human blood samples from CRC patients and controls. These analyses should reveal expression signatures that reflect variation in genetic susceptibility to CRC (mouse/human), and presence of (metastatic) CRC cells in the blood circulation (human).

2. Implementation of methods that select classifiers from microarray profiles to predict treatment outcome of cancer patients
In classification analysis (also called discriminant analysis or supervised learning) a classifier is constructed. A classifier assigns objects to pre-specified and unordered classes on the basis of measurements made on these objects. For instance in Golub et al. a classifier is built based on microarray expression profiles. The classifier distinguishes Acute Lymphoid Leukemia from Acute Myloid leukemia, and it does this with 100% accuracy
(1). Chromosomal copy numbers can also be used to identify markers that classify tumors using a microarray technique called “arrayCGH”(1-2). A combination of preprocessing procedures (3) and statistical testing  (4) identified potential markers for different head and neck tumors (5).
These markers are, however, identified on an independent basis, while we wish to combine these to construct a good classifier. Such a classifier should predict clinical outcome with array profiles well. The candidate will explore and test various implemented classification tools such as PAM. The classification techniques are to be applied to datasets from our own laboratory.

3.Detection of potential genomic markers: permutation tests for array CGH analysis
After pre-processing, array CGH  basically results in chromosomal copy number counts for thousands of chromosomal locations (clones). Since we have two copies of each chromosome, two copies is normal, but at some locations gains (> 2 copies) or losses (< 2 copies) may occur. These locations are of special interest. Often, biologists are interested in locations that show different proportions of gains (or losses) between two or more groups of samples, where the groups correspond to a clinical variable, eg. tumor status. Naturally, fluctuations between these proportions always occur, so one needs statistical tests to  decide which differences are probably real.

 Statistical permutation tests are widely available for various situations, but the multiplicity of the data hampers an efficient implementation. The necessary multiple testing corrections imply that very small p-values need to be computed accurately. We showed how to do so for a two-group setting  and applied the program to head-and-neck tumor data . You will generalize the program to an application which may be used in several situations (paired data, more than two groups, contingency tables, etc.). The application should be reasonably efficient and user-friendly containing a user-interface and visualisations. You will test the application for several in-house data sets such as gastic tumor data and head and neck tumor data.