1 / 15

Application of Artificial Intelligence to Healthcare Research Design

An Overview Workshop Utilizing the PICO Framework

Evaluating LDL-C Reduction and Safety of Local vs Original Atorvastatin, With Exercise as an Adjunct, in Adults ≥35 Years: A Synthetic Cohort Study

Instructor

Bhattaraprot Bhabhatsatam, Ph.D.

University Consultant in Information Technology and Digital Transformation, Mahidol University

Resources

QR Code for Workshop Materials

DISCLAIMER

This workshop uses synthetic data only. The dataset, results, and interpretations are created for educational and demonstration purposes. They do not represent real patients, real-world clinical outcomes, or validated medical evidence. The information in this workshop must not be used as a medical reference, clinical guideline, or basis for patient care.

Educational Objectives

Upon completion of this workshop, participants will demonstrate proficiency in:

Methodology: Interactive laboratory exercises utilizing established AI platforms including ChatGPT, Claude, and Gemini

Prerequisites and Technical Requirements

Clinical Foundation

  • Fundamental understanding of PICO framework principles
  • Clinical knowledge of LDL-C, ALT, and CK biomarkers
  • Research methodology foundations
  • Evidence-based practice principles

Technical Infrastructure

  • Computer workstation with reliable internet connectivity
  • Active account with artificial intelligence platform
  • Proficiency in CSV and Excel file management
  • Openness to experimental learning approaches

Additional Preparation: Foundational Python experience beneficial but not mandatory - artificial intelligence will generate requisite code

Research Problem

Clinical Research Challenge

Context: Cardiovascular disease represents a primary cause of global morbidity and mortality, with elevated low-density lipoprotein cholesterol serving as a significant modifiable risk factor.

Current Evidence Limitations

Study Population: Adults aged 35 years and above with LDL cholesterol levels exceeding guideline targets by ten percent or greater

Primary Research Question

In adults aged 35 years and above with elevated LDL cholesterol concentrations exceeding guideline targets by ten percent or greater, does locally manufactured atorvastatin 10 milligrams daily demonstrate non-inferior LDL cholesterol reduction and comparable short-term safety profile relative to proprietary atorvastatin 10 milligrams daily, and what additional therapeutic benefit is conferred by structured exercise counseling over a three-month intervention period?

Experimental Design Framework

Group A
Proprietary Formulation
Standard Care
Group B
Proprietary Formulation
Exercise Counseling
Group C
Generic Formulation
Standard Care
Group D
Generic Formulation
Exercise Counseling

Two-by-Two Factorial Design with Three-Month Follow-up Period

PICO Framework with Artificial Intelligence Integration

Population

AI Methodology:

Electronic health record data filtration, identification of missing entries, implementation of inclusion and exclusion criteria

Outcome: One thousand mixed records processed to identify 684 eligible participants

Intervention

AI Methodology:

Patient randomization, group stratification, generation of educational materials

Outcome: Two-by-two factorial randomization with automated patient instruction development

Comparison

AI Methodology:

Follow-up data integration, management of missing observations, statistical model execution

Outcome: Analysis of covariance revealing exercise intervention effects

Outcome

AI Methodology:

Summary table generation, data visualization, clinical interpretation development

Outcome: Results synthesis with clinical implications

AI-PICO Workflow Diagram

Input Data

📂 synthetic_HIS_1000_mixed.csv (simulated HIS export with noise)

P – Population (Cohort Selection)

AI Task: Clean raw HIS, apply inclusion/exclusion criteria, enforce consent requirements

Output: study_cohort_clean.csv (eligible patients only)

I – Intervention (Randomization & Study Tools)

AI Task: Randomize patients (Original vs Local × Exercise vs No Exercise), generate handouts & survey forms, schedule visits

Output:
  • study_cohort_randomized.csv (allocation)
  • study_cohort_randomized_with_dates.csv (with follow-up visits)
  • Patient handouts & surveys

C – Comparison (Analysis)

Input: Case record form 📂 visit12_response_template.csv (filled with outcomes)

AI Task: Merge baseline + follow-up, run ANCOVA/regression, create plots

Output: analysis_ready.csv + visualizations (boxplot, adjusted means)

O – Outcome (Results & Interpretation)

AI Task: Summarize outcomes, generate tables, interpret results in plain language

Output:
  • outcome_summary.csv (LDL, HDL, ALT, CK, adherence, AE rates)
  • Plots (adjusted means, outcome charts)
  • Draft text for research report

Final Workshop Deliverables

✓ Cleaned dataset
✓ Randomized trial arms + visit calendar
✓ Analysis-ready dataset
✓ Outcome summary tables & visualizations
✓ AI-written interpretation & report draft

Workshop Structure

Lab 1

Population

30 mins

Lab 2

Intervention

30 mins

Lab 3

Comparison

30 mins

Lab 4

Outcome

30 mins

Laboratory Components:

Technical Prerequisites:

Hardware Requirements

  • Personal computer, notebook, or tablet
  • Reliable internet connectivity
  • Web browser capability
  • File management access (CSV/Excel)

Software Requirements

  • Active Large Language Model account
  • Recommended platforms: Gemini, ChatGPT, or Claude
  • File upload capability within chosen platform
  • Copy-paste functionality for prompt examples

Laboratory Exercise One: Population Selection

Artificial Intelligence Task: Electronic Health Record Data Filtration for Study Eligibility

Objective: Process one thousand hospital records to identify 684 eligible study participants

Inclusion Criteria

Exclusion Criteria

Expected Outcome:

From 1000 HIS mixed records → AI selects 684 eligible patients (≥35 years, LDL ≥10% above target, consent signed, no pregnancy, no statin allergy)

Prompt Example 1 – Natural Language:
You are a clinical data assistant. From this CSV file of hospital patients, select those eligible for our study. Inclusion criteria: - Age ≥ 35 years - LDL cholesterol ≥ 10% above guideline threshold - Consent_sign = Y Exclusion criteria: - Pregnancy_flag = 1 - Statin_allergy_flag = 1 - Diagnosis includes liver_disease, myopathy, or renal_failure Return a clean table with only eligible patients and keep only these columns: patient_id, age, sex, ldl, hdl, alt, ck, diagnosis_codes, consent_sign
Prompt Example 2 – Python Script :
Write a Python script in pandas to filter a dataset named synthetic_HIS_1000_mixed.csv. Apply these rules: - Include patients age ≥35 and LDL ≥ threshold (10% above guideline). - Exclude pregnancy_flag=1, statin_allergy_flag=1, or diagnosis_codes containing ["liver_disease","myopathy","renal_failure"]. - Keep only these columns: patient_id, age, sex, ldl, hdl, alt, ck, diagnosis_codes, consent_sign. Save result to study_cohort_clean.csv.

Laboratory Exercise Two: Intervention Assignment

Artificial Intelligence Tasks: Randomization Protocol and Patient Educational Materials

Proprietary Formulation
Standard Care
Proprietary Formulation
Exercise Counseling
Generic Formulation
Standard Care
Generic Formulation
Exercise Counseling

Artificial Intelligence Generated Components:

Expected Outcome:

2×2 factorial randomization → 4 arms (Original vs Local drug × Exercise vs No Exercise). AI generated patient handouts + survey templates

Prompt Example 1 – Natural Language (Randomization):
You are a clinical trial coordinator. We have a dataset of eligible patients in study_cohort_clean.csv. Randomize patients into 4 arms equally (2×2 factorial): 1. Original Drug + No Exercise 2. Original Drug + Exercise 3. Local Drug + No Exercise 4. Local Drug + Exercise Add new columns: assigned_drug, exercise_advice, random_group. Export result as study_cohort_randomized.csv
Prompt Example 2 – Natural Language (Patient Handouts) :
You are a health educator. Write a simple patient handout (1 page, plain Thai language) for each intervention group: 1. Original Drug + No Exercise 2. Original Drug + Exercise 3. Local Drug + No Exercise 4. Local Drug + Exercise Include: - What medicine they receive - If they should exercise or not (and what kind of exercise) - Reminder to return for blood tests at week 6 and week 12
Prompt Example 3 – Micro-Survey for Patients :
Generate a 4-question survey (Google Form format) for patients after 12 weeks: - Did you take your medicine regularly? (Yes/No) - How many days per week did you exercise? (0–7) - Did you experience any muscle pain? (None/Mild/Severe) - Did you experience nausea or liver symptoms? (Yes/No)
Prompt Example 4 – Python Script (Randomization & Appointments) :
Write a Python script to: 1. Load study_cohort_clean.csv 2. Randomly assign each patient to one of 4 groups (Original vs Local × Exercise vs No). 3. Generate a follow-up schedule: baseline date = today, plus week 6 and week 12 visits. 4. Save to study_cohort_randomized_with_dates.csv

Laboratory Exercise Three: Statistical Comparison

Artificial Intelligence Tasks: Data Integration and Analysis of Covariance

Statistical Analysis Protocol

Expected Outcome:

Merged visit 12-week results → AI showed exercise lowers LDL ~10 mg/dL more; drug type had no significant difference

Prompt Example 1 – Natural Language (Data Merge):
You are a data analyst. We have 2 CSVs: - study_cohort_randomized_with_dates.csv (baseline + allocation) - visit12_response.csv (12-week outcomes) Merge them by patient_id. Keep baseline LDL, follow-up LDL, drug type, and exercise assignment. Prepare an analysis-ready table called analysis_ready.csv
Prompt Example 2 – Natural Language (Statistical Model) :
Perform an ANCOVA analysis: Outcome = follow-up LDL Covariate = baseline LDL Factors = assigned drug (original vs local), exercise (yes vs no), and their interaction. Report adjusted means and p-values for each group. Interpret results in plain language for clinicians.
Prompt Example 3 – Natural Language (Visualization) :
Create a boxplot or adjusted means chart showing follow-up LDL across the 4 study groups: - Original + No Exercise - Original + Exercise - Local + No Exercise - Local + Exercise Highlight whether exercise or drug type has the stronger effect.
Prompt Example 4 – Python Script (Full Analysis) :
Write a Python script using pandas and statsmodels to: 1. Merge baseline cohort with 12-week outcomes. 2. Fit an ANCOVA model: followup_ldl ~ baseline_ldl + drug + exercise + drug*exercise. 3. Output adjusted means for each group. 4. Save analysis results to analysis_results.csv. 5. Create a visualization (boxplot) of LDL by group.

Laboratory Exercise Four: Outcome Interpretation

Artificial Intelligence Tasks: Results Synthesis and Clinical Translation

Artificial Intelligence Generated Components:

Expected Outcome:

Outcome summary table + visualization; AI explains: "Exercise is dominant factor, drug type effect negligible"

Anticipated Clinical Findings:

Prompt Example 1 – Natural Language (Summary Table):
You are a medical research assistant. From analysis_ready.csv, create an outcome summary table with: - Group (drug × exercise) - N patients - Mean baseline LDL - Mean follow-up LDL - Mean LDL change - Mean ALT and CK - Adherence rate - Adverse event rate Format the table for publication (rounded, clear labels).
Prompt Example 2 – Natural Language (Interpretation):
Interpret the results of our study in plain clinical language: - Which factor had the strongest effect on LDL reduction? - Did original vs local drug show meaningful differences? - Were safety markers (ALT, CK) acceptable? - Summarize in 3–4 sentences for physicians.
Prompt Example 3 – Natural Language (Report Generation) :
Draft the Results section of a research report using the analysis results. Include: - Main findings on LDL changes - Safety observations - Adherence/AEs - Clinical implication (exercise importance, drug equivalence) Make it suitable for a journal submission.
Prompt Example 4 – Python Script (Visualization & Export):
Write a Python script to: 1. Read analysis_results.csv 2. Generate a bar chart of mean LDL reduction by group 3. Generate a line chart of LDL change from baseline to follow-up for each group 4. Save all plots as PNG files 5. Export summary stats into outcome_summary.csv

Effective Artificial Intelligence Communication Strategies for Research

Precision in Specifications

Define explicit inclusion and exclusion criteria, variable nomenclature, and output formatting requirements

Professional Role Assignment

Establish context through role specification such as "clinical data specialist" or "biostatistician"

Sequential Task Decomposition

Divide complex analytical procedures into discrete, manageable sequential components

Clinical Interpretation Request

Explicitly request translation of results into clinically appropriate language for healthcare practitioners

Dual Methodological Approaches per Laboratory Exercise:

Natural Language Processing

Direct communication with artificial intelligence systems utilizing conventional English language specifications

Code Generation Protocols

Request artificial intelligence development of Python or R scripts for reproducible analytical workflows

Workshop Implementation Guidelines and Best Practices

Initial Implementation Procedures

Common Implementation Challenges and Resolution Strategies

Communication Difficulties

Resolution: Enhance specificity regarding variable nomenclature, analytical criteria, and desired output specifications

Analytical Inaccuracies

Resolution: Explicitly define statistical methodology and mathematical model structure requirements

Data Quality Issues

Resolution: Request preliminary artificial intelligence assessment of missing values, outliers, and data integrity

Interpretation Limitations

Resolution: Specifically request clinical translation appropriate for healthcare practitioners with relevant context

Principal Learning Outcomes

Research Process Enhancement

  • Data management: Multiple hours reduced to minutes
  • Randomization procedures: Manual processes converted to automated systems
  • Statistical analysis: Complex methodologies rendered accessible
  • Result interpretation: Technical findings translated to clinical applications

Research Integrity Preservation

  • PICO framework continues to direct study design
  • Clinical expertise remains fundamental to research quality
  • Statistical principles maintain consistency with established methodology
  • Peer review processes preserved without modification

Artificial intelligence serves as an advanced research assistant, not a substitute for clinical expertise and professional judgment

Utilize these technologies to manage routine analytical tasks, enabling greater focus on research design, interpretation, and clinical application

Future Trends in AI-Enhanced Healthcare Research

Today's Workshop: Foundation Building

This laboratory exercise demonstrates accessible AI integration that every researcher can implement immediately

Advanced AI Technologies on the Horizon

AI Agents for Autonomous Research

  • Autonomous data collection and preprocessing without human intervention
  • Self-directed hypothesis generation and experimental design
  • Continuous learning from research outcomes and literature
  • Multi-agent collaboration for complex research protocols

Real-time Health Information System Integration

  • Model Context Protocol (MCP): Standardized AI communication with hospital systems
  • Retrieval-Augmented Generation (RAG): Dynamic integration of current medical literature
  • Live patient data streaming for prospective research studies
  • Automated compliance monitoring and ethical oversight

Comprehensive AI Research Platforms

  • Google NotebookLM: AI-powered research synthesis and collaboration
  • Integrated literature review, data analysis, and manuscript preparation
  • Multi-modal data processing (text, images, genomics, wearables)
  • Automated peer review and quality assessment systems

Your Research Journey Forward

Today

Basic AI integration with PICO framework

Next 6 Months

Implement AI agents for routine research tasks

Next 2 Years

Full integration with hospital systems and advanced platforms

Begin with today's foundations, then progressively adopt emerging technologies as they mature