Skip to main content
VCOM Carolinas Campus

Statistical Resources

Statistical Analyses

Statistical analysis refers to a rigorous and sophisticated set of mathematical procedures that are used to systematically organize, summarize, and interpret data. The primary objective of statistical analysis is to identify patterns, relationships, and correlations in data, as well as to test hypotheses and make predictions. The choice of statistical analysis method is contingent on a variety of factors, including the type of data being analyzed, the research question being investigated, and the level of precision required in the analysis. Given the critical role that statistical analysis plays in the research process, it is essential that researchers have a solid understanding of its significance. Through proper statistical analysis, researchers can avoid drawing erroneous conclusions, ensure the reliability and validity of their findings, and ultimately contribute to the advancement of knowledge in their field. Therefore, it is imperative that researchers approach statistical analysis with rigor and care, and seek guidance from experts in the field when necessary.

Sample Size Calculators

One fundamental aspect of statistical analysis that holds significant implications for medical professionals is the calculation of sample size. This vital statistical parameter is utilized to determine the minimum number of participants required in a study to achieve a desired level of statistical power. By ensuring that a study has sufficient sample size, researchers can increase the likelihood of detecting meaningful differences between groups, which is crucial for making accurate conclusions and decisions about treatments or interventions. Moreover, sample size calculation serves as an effective strategy to prevent the wastage of resources by ensuring that the study is adequately powered, which can mitigate the risk of producing false-negative results. Given its paramount importance, sample size calculation is an essential component of designing, conducting, and interpreting clinical trials and observational studies. It is a complex process that involves a consideration of a variety of factors, including the research question, the type of data being collected, the level of significance, and the statistical power required. There are numerous online resources available to assist with sample size calculation, including sample size calculators. These tools can be incredibly beneficial in planning and executing research studies, as they can provide an estimation of the sample size required to achieve the desired level of statistical power. However, it is important to note that the use of these resources should be done in conjunction with expert guidance and should be tailored to the specific research question and context of the study.

  1. Biostats4You: Power and Sample Size Concepts
    The Biostats4You site contains carefully selected and reviewed training materials especially suited for a non-statistician audience. This part provides the basic concepts of power and sample size.
  2. G*Power: Statistical Power Analyses for Windows and Mac
    This site provides free downloadable software that is easy to use and includes a detailed and helpful user manual. A wide range of statistical procedures is supported, including common mean and proportion tests as well as multiple linear regression, logistic regression, and Poisson regression.
  3. Southwest Oncology Group Statistical Center Power and Sample Size Calculators
    This resource provides online sample size/power calculators for one and two-sample tests of means and proportions as well as for simple survival analyses.
  4. UCSF Sample Size Calculators for Designing Clinical Trials 
    This site provides sample size calculators for the following settings: one group, two independent groups, and paired group designs; tests for means, proportions, and correlation; clustered data; confidence intervals; survival analysis, likelihood ratio (diagnostic test accuracy), posterior probability of disease, and pediatric growth.
  5. Sealed Envelope
    This website provides online tools to estimate the sample size needed for the following clinical trial settings, specifically superiority, equivalence, and non-inferiority trials for binary or continuous outcomes.
  6. Genetic Power Calculator
    This site provides automated power analysis for variance components (VC) quantitative trait locus (QTL) linkage and association tests in sibships and other common tests.

Data Preparation

Data preparation and organization is a critical step in statistical analyses. This initial step is critical to ensuring the accuracy, reliability, and proper format of data for analysis. Effective data preparation helps to mitigate errors and biases in the analysis, resulting in more reliable and trustworthy results. Additionally, proper data preparation can enhance the efficiency of the analysis, streamlining the process and reducing the potential for errors. There are several online resources available to assist with data preparation and organization, including tools and guidelines for data cleaning, normalization, and formatting.

  1. REDCap (Research Electronic Data Capture) 
    Using REDCap can greatly simplify data collection and minimize costly and time-consuming data clean-up activities. REDCap is a secure web-based application for building and managing online databases for research and is supported by the CTSC Biomedical Informatics team. Note: VCOM is REDCap partner; please contact Jim Rathmann,
    jrathmann@vcom.edu, to assist with setting up an account.
  2. Biostats4You Data: Data Collection and Management
    The Biostats4You site contains carefully selected and reviewed training materials especially suited for a non-statistician audience. This part provides information about data collection and management.
  3. Guidance for Database Developers for Efficient Import to Statistical Software (PDF)
    This guideline made by UC Davis Clinical and Translational Science Center Biostatistics Core provides guidelines to facilitate easy and accurate importation of databases (e.g., REDCap) into statistical software (e.g., R, SAS).
  4. Data Organization in Spreadsheets
    Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. By focusing on the data entry and storage aspects, this article offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses.

Statistical Analysis Software

Statistical analysis software provides a powerful tool for organizing, analyzing, and interpreting data, making it easier to identify patterns and correlations in data, test hypotheses, and make predictions. It can save time and effort, increase efficiency, and provide advanced graphical representation of data. It also allows for collaboration, repeatability, and the ability to handle large data sets. For those looking to learn more about popular open-source and free statistical software, there are several online resources available. These resources include tutorials, guides, and online communities, providing support and guidance for those looking to expand their knowledge and skills in using statistical analysis software.

  1. R and Swirl 
    R is a versatile and highly-regarded open-source programming language that is designed specifically for statistical computing and graphical representation. The power of R lies in its modular structure, which allows users to customize the program to meet their specific statistical needs. This is achieved by adding program modules, known as packages, to the base program. For those looking to learn the R programming language, there is a popular course available called Swirl. This course is designed to provide hands-on instruction in the R programming environment, specifically within an R console. It is an excellent resource for those looking to gain a deeper understanding of R programming, and has been integrated into other online courses, such as the HarvardX Statistics and R course. Swirl provides a comprehensive and interactive learning experience for those interested in mastering R programming.
  2. Real Statistics Using Excel
    This site provides comprehensive information on how to perform these tests using Microsoft Excel, including t-tests, ANOVA, repeated measure ANOVA, Correlation, Simple and Multiple Linear Regression, calculating confidence intervals, and other descriptive statistics. This site offers a free resource pack and example workbooks that can be downloaded. These resources provide hands-on experience and practical applications of the statistical tests and procedures, allowing you to gain a deeper understanding of how to perform these analyses in Excel. Whether you are a beginner or have prior experience with statistical analysis, these resources will be a valuable addition to your knowledge and skills.
  3. JASP
    JASP is a free, stand-alone software that provides a graphical user interface (GUI) for the R Statistical Computing Software. This user-friendly interface makes it easy to access and use advanced statistical methods and procedures. JASP supports both Frequentist and Bayesian statistical analysis, providing a comprehensive range of methods and procedures for data analysis. Some of the commonly used methods available through JASP include summary statistics, correlations, two-sample tests, and linear and logistic regression. Additionally, the software also supports more advanced methods, including mixed-effect models, structural equation models, meta-analysis, principal components analysis, and factor analysis.
  4. Interactive Statistical Calculation Pages
    This resource includes a comprehensive list of sites for many statistical analyses, including power and sample size calculations. The website includes a page titled "Interactive Stats" which provides a list of websites offering interactive analysis tools, and a page titled "Free Software" which includes links to download and run free software packages on your local computer. The website also provides links to numerous technical resources on statistics, including introductory material for those who may be new to the field.

Statistics in Medical/Healthcare Research

Statistics is a powerful tool that allows medical professionals and scientists to make sense of complex data and draw valid conclusions from research studies. Statistics provides methods for evaluating the validity and reliability of results, which is crucial for determining the clinical relevance and generalizability of findings. The increasing popularity of open-source, easy-to-use software such as R for statistical analysis in medical research has made learning how to program and use R a valuable skill for medical professionals. Most online courses about statistics in medical applications use R. There are several useful online resources available, including online courses, tutorials, and online communities. These resources provide support and guidance for medical professionals and scientists looking to expand their knowledge and skills in statistics, and to apply these skills in their research studies.

  1. Statistics and R  (Harvard University)
    An introduction to basic statistical concepts and R programming skills is necessary for analyzing life sciences data. This course outlines the basics of statistical inference to understand and compute p-values and confidence intervals. The course covers examples of programming in R in a way that will help make the connection between concepts and implementation. Problem sets requiring R programming will be used to test understanding and ability to implement fundamental data analyses. Visualization techniques are used to explore new data sets and determine the most appropriate approach. This site further describes robust statistical techniques as alternatives when data do not fit the assumptions required by the standard methods. The basics of using R scripts to conduct reproducible research is introduced as well. The instructors are currently adding lessons to the website on how to do many of the same things in Python.
  2. Introduction to Applied Biostatistics: Statistics for Medical Research (Osaka University)
    This Applied Biostatistics course introduces essential topics in medical statistical concepts and reasoning. Each topic will be presented with examples from published clinical research papers, and all homework assignments will expose the learner to hands-on data analysis using real-life datasets. This course also introduces basic epidemiological concepts covering study designs and sample size computation. Open-source, easy-to-use software, such as R Commander and PS sample size software, will be used.
  3. Biostatistics in Public Health Specialization (Johns Hopkins University)
    This specialization is intended for public health and healthcare professionals, researchers, data analysts, social workers, and others who need a comprehensive concepts-centric biostatistics primer. Those who complete the specialization will be able to read and respond to the scientific literature in public health, medicine, biological science, and related fields, including the Methods and Results sections. Successful learners will also be prepared to participate in a research team.
  4. Biostatistics Resources for Non-Statisticians (Duke University)
    This online series of training videos provide introductory educational materials for learners who are new to clinical research and collaboration. The goal is to give learners a general understanding of clinical research and data analysis to facilitate communication and collaboration with a quantitative expert, such as a biostatistician.
  5. Biostats4you (University of Minnesota)
    The Biostats4you website was developed (by the University of Minnesota) to serve medical and public health researchers and professionals who wish to learn more about biostatistics. The site contains carefully selected and reviewed training materials especially suited for a non-statistician audience.
  6. Biostatistics for Clinical Researchers (Columbia University)
    This site hosts an online video seminar series from Columbia University Irving Institute for Clinical and Translational Research covering a broad range of statistical topics.
  7. Biostatistics MCW
    (Medical College of Wisconsin)This YouTube seminar series covers a variety of topics including, but not limited to, longitudinal analysis, survival analysis, propensity scores, Bayesian statistics, linear regression, sample size calculations, ANOVA, multiple comparisons, and logistic regression.

In addition to resources specific to biomedical research, below are several useful online resources available for gaining a comprehensive understanding of statistics in general research.

  1. Master Statistics with R (Duke University via Coursera)
    This course on Coursera by Duke covers a college-level statistics class along with how to run the tests in R. The teacher involved with this series is also involved with a free open
    statistics textbook which is a good complement to the classes or for students who would prefer to read specific lessons rather than take a class.
  2. Introduction to Biostatistics for Big Data Applications (via edX)
    This Introduction to Biostatistics course provides a basic overview of foundational statistical terms and concepts. The material is categorized into eight successive components. Block 1 provides the distinction between study populations and samples, definitions for different scales of measurement, and an overview of basic descriptive statistics. Block 2 emphasizes the importance of visualizing data during the design and analysis steps. Several different types of graphs are presented. Block 3 covers the basics of hypothesis testing, including confidence intervals, p-values, and potential errors in interpretation. Block 4 walks one through the process of comparing means from two groups (unpaired and paired t-tests). Block 5 introduces the concept of analysis of variance and focuses on one-way ANOVA. Block 6 discusses two-way ANOVA, and Block 7 covers repeated measures ANOVA. Finally, the course is wrapped up with Block 8, which covers statistical hypothesis tests when the assumption of normality is not met. In addition, students in this course will be introduced to the R software package.
  3. Data Science series (Johns Hopkins University via Coursera)
    This series includes ten separate courses that go through the basics of working with data and programming in R (the R Programming course in this series is also linked separately above), data analysis and statistical inference, and machine learning. This Specialization covers the concepts and tools you'll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. In the final Capstone Project, you'll apply the skills learned by building a data product using real-world data. At completion, students will have a portfolio demonstrating their mastery of the material.  
  4. Introduction to Statistical Learning (Stanford University)
    This is an advanced course that covers introductory machine learning, regression, etc. It is an option for further learning after taking the above courses. You can download the free pdf version of the textbook.

Additional Resource

In grant applications, the inclusion of robust statistical analyses in grant applications is essential for demonstrating the importance and feasibility of the proposed research. Statistical analysis helps to establish the practicality of the proposed research as well as its high potential for detecting meaningful differences and answering the research question. Including statistical analysis in grant applications is a critical component in convincing proposal reviewers of the potential impact and importance of the proposed research. By demonstrating the use of appropriate statistical tools, medical professionals can show that they have the necessary expertise and resources to carry out the proposed research in a rigorous and methodical manner. There are several online resources available that may be useful, including guides, templates, and sample applications.

  • Statistics Guide for Research Grant Applicants
    An excellent handbook that outlines how to prepare the statistical content for grant proposals. Sections include “Describing the Study Design,” “Sample Size Calculations,” and “Describing the Statistical Methods,” among others.
  • Principles and Guidelines for Reporting Preclinical Research
    NIH held a joint workshop in June 2014 with the Nature Publishing Group and Science on the issue of reproducibility and rigor of research findings, with journal editors representing over 30 basic/preclinical science journals in which NIH-funded investigators have most often published. The workshop focused on identifying the common opportunities in the scientific publishing arena to enhance rigor and further support research that is reproducible, robust, and transparent.
  • VCOM Statistical Consulting
    The Research Biostatisticians at VCOM offer comprehensive statistical support, guidance, and education to the researchers at VCOM through customized consulting and collaborative efforts. This encompasses a range of services, including assistance with study design, sample size calculation, data visualization and analysis, as well as interpretation of statistical concepts and analysis results. They provide expert statistical insight and advice to ensure that VCOM researchers have the tools and knowledge necessary to successfully design and execute their studies, and to effectively interpret and communicate their results.
  • Books/Handbooks