Ilyá Kostolomov

From BMIMaster
Jump to: navigation, search



Trevor Clancy
Eivind Hovig

Areas of interest

Benchmarking, data structures, parallel processing, graph theory, reproducible research, tumor biology.

Areas of proficiency

Python, Scala, Play, Django, Flask; web, graphics and electronic design.

Specifics of the thesis


The field of personalized cancer treatment is a promising area, enabled by the vast amounts of data from the reduced cost of next-generation sequencing techniques. The task of understanding and representation of the data is being gradually solved by numbers of specific databases with public APIs. However, integration of such services for further research and statistical analysis remains a challenge.

I hold a firm belief that dynamic modelling and purely programmatic frameworks realistically anticipate the future developments in network biology, and allow this science to yield more reproducible results. My thesis will focus on development of a tool for integration between network biology toolkits and various sources of data (mostly genomic, gene regulatory, and protein-protein interactions). One of the purposes, is to bridge the gap between network biology and dynamic simulation in temporal domain. The other goal, is to allow rapid programmatic manipulation of biological networks, spawned by the novel methods of sequencing.

I plan to demonstrate the results of my thesis, by testing my approach on a cancer-specific dataset (of BRAF-pathways in melanoma), with applications in the field of personalized cancer treatment.

Terms and definitions

Protein–protein interactions (PPI)
intentional physical contacts established between two or more proteins as a result of biochemical events and/or electrostatic forces


python language software package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
an open source software platform for visualizing complex networks and integrating these with any type of attribute data.
an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
an open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
a web microframework (this microframework has a server that allows fast prototyping and visualization of data in browser with power of jquery, DOM and d3).
a framework for building mathematical models of biochemical systems as Python programs. PySB abstracts the complex process of creating equations describing interactions among multiple proteins or other biomolecules into a simple and intuitive domain specific programming language (see example below), which is internally translated into BioNetGen or Kappa rules and from there into systems of equations.


Date Q% C:N Description Source
2014-02-02 50% Wikipedia/Transcriptome introduction to the RNAseq pipeline that clearly outlines the difference between DNA and RNA transcriptions and expression profiling in terms of methods and uses.
15% Algorithms for RNA Sequencing. First ten pages explain the benefits and specifics of RNAseq. The rest of the presentation is about pipeline algorithms.
0% Transcription attenuation in bacteria: theme and variations.
2014-02-03 15% Y#1 It's the machine that matters: Predicting gene function and phenotype from protein networks (Wang, Marcotte)
75% Y#4 Principles and Strategies for Developing Network Models in Cancer (Dana Pe'er, Nir Hacohen)
25% Y#3 Inferring protein domain interactions from databases of interacting proteins (Riley, Lee, Sabatti, Eisenberg)
25% Y#9 Inferring Domain-Domain Interactions From Protein-Protein Interactions (Deng, Mehta, Sun et al.)
71% Y#2 Differential network biology (Trey Ideker, Nevan J. Krogan)
2014-02-04 0% RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome (Li, Dewey)
0% Illumina HiSeq 2000 User Guide Illumina Proprietary
0% Hello and welcome to wonderful world of DNA sequencing explains the instrumental side of sequencing
35% Y#8 SnapShot: Protein-Protein Interaction Networks (Seebacher, Gavin) Cell 144, March 18, 2011
2014-02-05 15% (recent) The Network Organization of Cancer-associated Protein Complexes in Human Tissues
2014-02-06 80% Mapping and quantifying mammalian transcriptomes by RNA-Seq
2014-02-08 10% * Cancer signaling networks and their implications for personalized medicine
0% * Navigating cancer specific attractors for tumor-specific therapy
0% * Mutations of the BRAF gene in human cancer
0% * Computational approaches to identify functional genetic variants in cancer genome


While cancers had been described with some unifying features or hallmarks that to some exten[t] helped our understanding of the disease (Hanahan and Weinberg, 2000, 2011), canncer sequencing revealed a high inter-patient displarity in their tumor genetic mutations (Votelstein et al., 2013), which led to the hard realization that the interpretation of cancer sequencing data would be the real bottleneck separating generation of new data and generation of new knowledge that could transate ino better therapies (Yaffe, 2013).

Although the carcinogenicity of a particular mutation depends on concurrent genomic alterations in the cell, one can significantly reduce the number of potential driver candidates by determining the functional impact of each mutation. Thus, a key challenge is to distinguish between functional and non-functional mutations, and by extension between those that contribute to tumorigenesis (drivers) and those that do not (passengers)

Given that the majority of somatic mutations reside in non-coding sequence, the need to computationally prioritize them for follow-up functional validation is clear. The recent discovery of melanoma driver mutations in the promoter sequence of telomerase reverse transcriptase (TERT) gene highlights the potential of regulatory variation to drive tumorigenesis43,44. As cancer genome projects are moving toward sequencing whole genomes, more non-coding driving mutations will likely be discovered. To facilitate such discoveries more computational method development to score regulatory variants is needed.


TCGA Data Matrix (useful as source of RNAseq, Skin Cutaneous Melanoma available)
TCGA collects and analyzes high-quality tumor samples. In addition to collecting and analyzing high-quality tumor samples, TCGA is also attempting to include high-quality non-tumor samples in some assays. The goal is to analyze every participant's germline DNA to establish which abnormalities detected in a tumor sample are peculiar to the oncogenic process. The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.The overarching goal of TCGA is to improve our ability to diagnose, treat and prevent cancer. TCGA Data Primer and Wiki
European Nucleotide Archive
Whole _Exome_ sequencing of melanoma cell lines
RNAseq atlas
Downloadable tables from UCSC genes track
This directory contains the downloadable tables describing the known protein data represented in the UCSC Genes track.
KEGG PATHWAY Database Wiring diagrams of molecular interactions, reactions, and relations
These diagrams have been assembled by Cell Signaling Technology (CST) scientists and outside experts to provide succinct and current overviews of selected signal transduction pathways. (* inhibition of apoptosis)
Personal tools