#32 Deep tensor factorization and a pitfall for machine learning methods with Jacob Schreiber

the bioinformatics chat

Contenido proporcionado por Roman Cheplyaka. Todo el contenido del podcast, incluidos episodios, gráficos y descripciones de podcast, lo carga y proporciona directamente Roman Cheplyaka o su socio de plataforma de podcast. Si cree que alguien está utilizando su trabajo protegido por derechos de autor sin su permiso, puede seguir el proceso descrito aquí https://es.player.fm/legal.

5+ y ago 1:15:14

MP3•Episodio en casa

In this episode, we hear from Jacob Schreiber about his algorithm, Avocado.

Avocado uses deep tensor factorization to break a three-dimensional tensor of epigenomic data into three orthogonal dimensions corresponding to cell types, assay types, and genomic loci. Avocado can extract a low-dimensional, information-rich latent representation from the wealth of experimental data from projects like the Roadmap Epigenomics Consortium and ENCODE. This representation allows you to impute genome-wide epigenomics experiments that have not yet been performed.

Jacob also talks about a pitfall he discovered when trying to predict gene expression from a mix of genomic and epigenomic data. As you increase the complexity of a machine learning model, its performance may be increasing for the wrong reason: instead of learning something biologically interesting, your model may simply be memorizing the average gene expression for that gene across your training cell types using the nucleotide sequence.

Links:

If you enjoyed this episode, please consider supporting the podcast on Patreon.

70 episodios

#Bioinformatics #Genetics #Algorithms #Ngs #Roman Cheplyaka #Biology #Science #Natural Sciences #Sequence #Genomics