Key Concepts in Bio-Statistics for Data Scientists

Prasad Bhave
4 min readDec 27, 2019

--

It is final week of 2019 and most people including myself, have already moved on to 2020, pondering — reviewing 2019 — cherishing learnings and achievements of 2019 — planning for a better 2020 !

Data Analytics, Machine Learning, Artificial Intelligence and Big Data are the buzz words in almost all verticals / industries , more so in Healthcare , Life Sciences and Pharmaceutical Industry. Many a times, I have been asked by younger professionals as to what are the critical concepts in Bio-statistics , epidemiology and data analytics that one needs to master? This final article of the year is in parts intended as my recommendations, quick tips and tricks for all young professionals who are in the domain of data science , applied statistics and especially for those venturing in Healthcare data analytics as Healthcare — Life Sciences- Clinical Scientists. These are some of the learnings gathered from experience and practice of Bio-statistics, Epidemiology, Population Health Management working with Real World Healthcare and Clinical Data. They are intended more as pointers to learning.
Here are my top 5 tips and takeaways:

1) Know what is standard distribution well, know why it came to play as standard modeling tool. More so know its limitations in healthcare data !!
2) Though taught at STAT101 level , but often forgotten ( or often over-sighted by marketing folks), clearly understand that correlation is NOT causation
3) Know and understand clearly the types of statistical sampling distributions. Apply various distributions as possible in any data and then try to find a good fit for your data.

4) Chi-square test is one test that would help you a long way in the field of Healthcare data analytics , Epidemiology and even pure statistics

5) This is my favorite , The Central Limit Theorem (CLT) : In my view , if one asks me one concept that is universal and across different domains of analytics, data science and occupies the central place in hypothesis testing , predictive analytics and is leveraged most.. it is CLT ! period !

What is CLT and Why is it important ( also see the link for a tutorial )

  • The central limit theorem (CLT) states that the distribution of sample means approximates a normal distribution as the sample size gets larger.
  • Sample sizes equal to or greater than 30 are considered sufficient for the CLT to hold.
  • A key aspect of CLT is that the average of the sample means and standard deviations will equal the population mean and standard deviation.
  • A sufficiently large sample size can predict the characteristics of a population accurately.

Here is a link to the tutorial for CLT that I would recommend

Invest time and take efforts to understand CLT

  • CLT is like Hindustani classical music.. in a way that it unfolds a “raag” in new form every time, an artist performs !
    -Understanding the applications of CLT is a continuous learning process. One can find its implication in anything from a political election prediction to probability of someone winning a lottery on a given day to Non performance of a star franchise player of a premier league team during key match to outcome of a therapy in certain cohort of subjects !
  • Keep faith in CLT .. never under-estimate the power of CLT !! Life as a Data Scientist would then be a cake walk !!
  • If you understand CLT well , most of your data science work is half done !!
  • Learn to apply Chi-square test .. it will solve most of your analytics problems and give you a clear insight into data and data visualization !
  • CLT teaches you something new each day of your life .. until death !! ( and may be even thereafter ! )

Adieu 2019 .. you were good .. 2020 would be better !

Here is a link to my personal blog with my final blog-post of the year about philosophy that has guided through my life and professional career : In Celebration of being alive !

Bhairavi of year 2019 — Suffering ennobles you .. makes you a better person !

--

--

Prasad Bhave
Prasad Bhave

Written by Prasad Bhave

Healthcare-Life Sciences-Pharma-Insurance | Management Consulting | Clinician-Scientist | IT-ITES, Medical Devices, IoT | https://www.linkedin.com/in/pbhave/

No responses yet