Ilyá Kostolomov

From BMIMaster
Jump to: navigation, search

Contents

Supervisors

Trevor Clancy
Eivind Hovig

Areas of interest

Benchmarking, data structures, parallel processing, graph theory, reproducible research, tumor biology.

Areas of proficiency

Python, Scala, Play, Django, Flask; web, graphics and electronic design.

Specifics of the thesis

Area-of-interest.png

The field of personalized cancer treatment is a promising area, enabled by the vast amounts of data from the reduced cost of next-generation sequencing techniques. The task of understanding and representation of the data is being gradually solved by numbers of specific databases with public APIs. However, integration of such services for further research and statistical analysis remains a challenge.

I hold a firm belief that dynamic modelling and purely programmatic frameworks realistically anticipate the future developments in network biology, and allow this science to yield more reproducible results. My thesis will focus on development of a tool for integration between network biology toolkits and various sources of data (mostly genomic, gene regulatory, and protein-protein interactions). One of the purposes, is to bridge the gap between network biology and dynamic simulation in temporal domain. The other goal, is to allow rapid programmatic manipulation of biological networks, spawned by the novel methods of sequencing.

I plan to demonstrate the results of my thesis, by testing my approach on a cancer-specific dataset (of BRAF-pathways in melanoma), with applications in the field of personalized cancer treatment.

Terms and definitions

Protein–protein interactions (PPI)
intentional physical contacts established between two or more proteins as a result of biochemical events and/or electrostatic forces

Tools

networkx
python language software package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
http://networkx.github.io/
cytoscape
an open source software platform for visualizing complex networks and integrating these with any type of attribute data.
http://www.cytoscape.org/
pandas
an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
http://pandas.pydata.org/
redis
an open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
http://redis.io/
flask
a web microframework (this microframework has a server that allows fast prototyping and visualization of data in browser with power of jquery, DOM and d3).
http://flask.pocoo.org/
PySB
a framework for building mathematical models of biochemical systems as Python programs. PySB abstracts the complex process of creating equations describing interactions among multiple proteins or other biomolecules into a simple and intuitive domain specific programming language (see example below), which is internally translated into BioNetGen or Kappa rules and from there into systems of equations.
http://docs.pysb.org/en/latest/tutorial.html

Papers

Date Q% C:N Description Source
2014-02-02 50% Wikipedia/Transcriptome introduction to the RNAseq pipeline that clearly outlines the difference between DNA and RNA transcriptions and expression profiling in terms of methods and uses. http://en.wikipedia.org/wiki/Transcriptome
15% Algorithms for RNA Sequencing. First ten pages explain the benefits and specifics of RNAseq. The rest of the presentation is about pipeline algorithms. http://www.mi.fu-berlin.de/wiki/pub/ABI/ForumSpace/1Lecture.pdf
0% Transcription attenuation in bacteria: theme and variations. http://bfg.oxfordjournals.org/content/9/2/178.full
2014-02-03 15% Y#1 It's the machine that matters: Predicting gene function and phenotype from protein networks (Wang, Marcotte) http://www.ncbi.nlm.nih.gov/pubmed/20637909
75% Y#4 Principles and Strategies for Developing Network Models in Cancer (Dana Pe'er, Nir Hacohen) http://www.ncbi.nlm.nih.gov/pubmed/21414479
25% Y#3 Inferring protein domain interactions from databases of interacting proteins (Riley, Lee, Sabatti, Eisenberg) http://genomebiology.com/2005/6/10/r89
25% Y#9 Inferring Domain-Domain Interactions From Protein-Protein Interactions (Deng, Mehta, Sun et al.) http://www.ncbi.nlm.nih.gov/pubmed/12368246
71% Y#2 Differential network biology (Trey Ideker, Nevan J. Krogan) http://www.ncbi.nlm.nih.gov/pubmed/22252388
2014-02-04 0% RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome (Li, Dewey) http://www.biomedcentral.com/1471-2105/12/323
0% Illumina HiSeq 2000 User Guide Illumina Proprietary http://tinyurl.com/nmzq3mq
0% Hello and welcome to wonderful world of DNA sequencing explains the instrumental side of sequencing http://tinyurl.com/nfrcrx4
35% Y#8 SnapShot: Protein-Protein Interaction Networks (Seebacher, Gavin) Cell 144, March 18, 2011
2014-02-05 15% (recent) The Network Organization of Cancer-associated Protein Complexes in Human Tissues http://www.nature.com/srep/2013/130408/srep01583/pdf/srep01583.pdf
2014-02-06 80% Mapping and quantifying mammalian transcriptomes by RNA-Seq http://www.nature.com/nmeth/journal/v5/n7/full/nmeth.1226.html
2014-02-08 10% * Cancer signaling networks and their implications for personalized medicine http://orbit.dtu.dk/fedora/objects/orbit:127655/datastreams/file_72b24f92-20a7-4ba2-b249-458096df5db0/content
0% * Navigating cancer specific attractors for tumor-specific therapy http://www.nature.com/nbt/journal/v30/n9/pdf/nbt.2345.pdf
0% * Mutations of the BRAF gene in human cancer http://www.nature.com/nature/journal/v417/n6892/pdf/nature00766.pdf
0% * Computational approaches to identify functional genetic variants in cancer genome http://individual.utoronto.ca/reimand/paper_pdf/GonzalezPerez_cancer_mutations_NatMeth.pdf

Quotes

While cancers had been described with some unifying features or hallmarks that to some exten[t] helped our understanding of the disease (Hanahan and Weinberg, 2000, 2011), canncer sequencing revealed a high inter-patient displarity in their tumor genetic mutations (Votelstein et al., 2013), which led to the hard realization that the interpretation of cancer sequencing data would be the real bottleneck separating generation of new data and generation of new knowledge that could transate ino better therapies (Yaffe, 2013).

http://orbit.dtu.dk/fedora/objects/orbit:127655/datastreams/file_72b24f92-20a7-4ba2-b249-458096df5db0/content

Although the carcinogenicity of a particular mutation depends on concurrent genomic alterations in the cell, one can significantly reduce the number of potential driver candidates by determining the functional impact of each mutation. Thus, a key challenge is to distinguish between functional and non-functional mutations, and by extension between those that contribute to tumorigenesis (drivers) and those that do not (passengers)

http://individual.utoronto.ca/reimand/paper_pdf/GonzalezPerez_cancer_mutations_NatMeth.pdf

Given that the majority of somatic mutations reside in non-coding sequence, the need to computationally prioritize them for follow-up functional validation is clear. The recent discovery of melanoma driver mutations in the promoter sequence of telomerase reverse transcriptase (TERT) gene highlights the potential of regulatory variation to drive tumorigenesis43,44. As cancer genome projects are moving toward sequencing whole genomes, more non-coding driving mutations will likely be discovered. To facilitate such discoveries more computational method development to score regulatory variants is needed.

http://individual.utoronto.ca/reimand/paper_pdf/GonzalezPerez_cancer_mutations_NatMeth.pdf

Datasets

TCGA Data Matrix (useful as source of RNAseq, Skin Cutaneous Melanoma available)
TCGA collects and analyzes high-quality tumor samples. In addition to collecting and analyzing high-quality tumor samples, TCGA is also attempting to include high-quality non-tumor samples in some assays. The goal is to analyze every participant's germline DNA to establish which abnormalities detected in a tumor sample are peculiar to the oncogenic process. The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.The overarching goal of TCGA is to improve our ability to diagnose, treat and prevent cancer. TCGA Data Primer and Wiki
https://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm
European Nucleotide Archive
Whole _Exome_ sequencing of melanoma cell lines
http://www.ebi.ac.uk/ena/data/view/ERS179383
RNAseq atlas
http://medicalgenomics.org/rna_seq_atlas/
Downloadable tables from UCSC genes track
This directory contains the downloadable tables describing the known protein data represented in the UCSC Genes track.
http://hgdownload.soe.ucsc.edu/goldenPath/proteinDB/proteins121210/database/
KEGG PATHWAY Database Wiring diagrams of molecular interactions, reactions, and relations
http://www.genome.jp/kegg/pathway.html
These diagrams have been assembled by Cell Signaling Technology (CST) scientists and outside experts to provide succinct and current overviews of selected signal transduction pathways.
http://www.cellsignal.com/reference/pathway/ (* inhibition of apoptosis)
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox