Educational Objectives
Upon completion of this workshop, participants will demonstrate proficiency in:
- Integration of artificial intelligence methodologies into each component of the PICO framework
- Application of AI-assisted tools for systematic filtering and validation of synthetic patient datasets
- Implementation of computerized randomization protocols and generation of patient educational materials
- Execution of statistical analyses utilizing artificial intelligence support systems
- Synthesis of research outcomes and development of clinically-oriented interpretations
Methodology: Interactive laboratory exercises utilizing established AI platforms including ChatGPT, Claude, and Gemini
Primary Research Question
In adults aged 35 years and above with elevated LDL cholesterol concentrations exceeding guideline targets by ten percent or greater, does locally manufactured atorvastatin 10 milligrams daily demonstrate non-inferior LDL cholesterol reduction and comparable short-term safety profile relative to proprietary atorvastatin 10 milligrams daily, and what additional therapeutic benefit is conferred by structured exercise counseling over a three-month intervention period?
Experimental Design Framework
Group A
Proprietary Formulation
Standard Care
Group B
Proprietary Formulation
Exercise Counseling
Group C
Generic Formulation
Standard Care
Group D
Generic Formulation
Exercise Counseling
Two-by-Two Factorial Design with Three-Month Follow-up Period
Laboratory Exercise One: Population Selection
Artificial Intelligence Task: Electronic Health Record Data Filtration for Study Eligibility
Objective: Process one thousand hospital records to identify 684 eligible study participants
Inclusion Criteria
- Age greater than or equal to 35 years
- LDL cholesterol concentration exceeding guideline threshold by ten percent or greater
- Documented informed consent
Exclusion Criteria
- Pregnancy status
- Known statin hypersensitivity
- Documented hepatic disease, myopathy, or renal insufficiency
Expected Outcome:
From 1000 HIS mixed records → AI selects 684 eligible patients (≥35 years, LDL ≥10% above target, consent signed, no pregnancy, no statin allergy)
Prompt Example 1 – Natural Language:
You are a clinical data assistant.
From this CSV file of hospital patients, select those eligible for our study.
Inclusion criteria:
- Age ≥ 35 years
- LDL cholesterol ≥ 10% above guideline threshold
- Consent_sign = Y
Exclusion criteria:
- Pregnancy_flag = 1
- Statin_allergy_flag = 1
- Diagnosis includes liver_disease, myopathy, or renal_failure
Return a clean table with only eligible patients and keep only these columns:
patient_id, age, sex, ldl, hdl, alt, ck, diagnosis_codes, consent_sign
Prompt Example 2 – Python Script :
Write a Python script in pandas to filter a dataset named synthetic_HIS_1000_mixed.csv.
Apply these rules:
- Include patients age ≥35 and LDL ≥ threshold (10% above guideline).
- Exclude pregnancy_flag=1, statin_allergy_flag=1, or diagnosis_codes containing
["liver_disease","myopathy","renal_failure"].
- Keep only these columns: patient_id, age, sex, ldl, hdl, alt, ck, diagnosis_codes, consent_sign.
Save result to study_cohort_clean.csv.
Laboratory Exercise Two: Intervention Assignment
Artificial Intelligence Tasks: Randomization Protocol and Patient Educational Materials
Proprietary Formulation
Standard Care
Proprietary Formulation
Exercise Counseling
Generic Formulation
Standard Care
Generic Formulation
Exercise Counseling
Artificial Intelligence Generated Components:
- Randomization algorithm with balanced allocation ratio
- Patient educational materials customized for each treatment group
- Follow-up questionnaire instruments
- Appointment scheduling protocol for weeks six and twelve
Expected Outcome:
2×2 factorial randomization → 4 arms (Original vs Local drug × Exercise vs No Exercise). AI generated patient handouts + survey templates
Prompt Example 1 – Natural Language (Randomization):
You are a clinical trial coordinator.
We have a dataset of eligible patients in study_cohort_clean.csv.
Randomize patients into 4 arms equally (2×2 factorial):
1. Original Drug + No Exercise
2. Original Drug + Exercise
3. Local Drug + No Exercise
4. Local Drug + Exercise
Add new columns: assigned_drug, exercise_advice, random_group.
Export result as study_cohort_randomized.csv
Prompt Example 2 – Natural Language (Patient Handouts) :
You are a health educator.
Write a simple patient handout (1 page, plain Thai language) for each intervention group:
1. Original Drug + No Exercise
2. Original Drug + Exercise
3. Local Drug + No Exercise
4. Local Drug + Exercise
Include:
- What medicine they receive
- If they should exercise or not (and what kind of exercise)
- Reminder to return for blood tests at week 6 and week 12
Prompt Example 3 – Micro-Survey for Patients :
Generate a 4-question survey (Google Form format) for patients after 12 weeks:
- Did you take your medicine regularly? (Yes/No)
- How many days per week did you exercise? (0–7)
- Did you experience any muscle pain? (None/Mild/Severe)
- Did you experience nausea or liver symptoms? (Yes/No)
Prompt Example 4 – Python Script (Randomization & Appointments) :
Write a Python script to:
1. Load study_cohort_clean.csv
2. Randomly assign each patient to one of 4 groups (Original vs Local × Exercise vs No).
3. Generate a follow-up schedule: baseline date = today, plus week 6 and week 12 visits.
4. Save to study_cohort_randomized_with_dates.csv
Laboratory Exercise Three: Statistical Comparison
Artificial Intelligence Tasks: Data Integration and Analysis of Covariance
Statistical Analysis Protocol
- Primary Endpoint: Mean change in LDL cholesterol concentration from baseline to three months
- Statistical Methodology: Analysis of covariance with baseline LDL cholesterol as covariate
- Independent Variables: Pharmaceutical formulation, exercise counseling, and interaction effects
- Safety Assessments: Alanine aminotransferase, creatine kinase concentrations, and muscular symptoms
Expected Outcome:
Merged visit 12-week results → AI showed exercise lowers LDL ~10 mg/dL more; drug type had no significant difference
Prompt Example 1 – Natural Language (Data Merge):
You are a data analyst.
We have 2 CSVs:
- study_cohort_randomized_with_dates.csv (baseline + allocation)
- visit12_response.csv (12-week outcomes)
Merge them by patient_id.
Keep baseline LDL, follow-up LDL, drug type, and exercise assignment.
Prepare an analysis-ready table called analysis_ready.csv
Prompt Example 2 – Natural Language (Statistical Model) :
Perform an ANCOVA analysis:
Outcome = follow-up LDL
Covariate = baseline LDL
Factors = assigned drug (original vs local), exercise (yes vs no), and their interaction.
Report adjusted means and p-values for each group.
Interpret results in plain language for clinicians.
Prompt Example 3 – Natural Language (Visualization) :
Create a boxplot or adjusted means chart showing follow-up LDL across the 4 study groups:
- Original + No Exercise
- Original + Exercise
- Local + No Exercise
- Local + Exercise
Highlight whether exercise or drug type has the stronger effect.
Prompt Example 4 – Python Script (Full Analysis) :
Write a Python script using pandas and statsmodels to:
1. Merge baseline cohort with 12-week outcomes.
2. Fit an ANCOVA model: followup_ldl ~ baseline_ldl + drug + exercise + drug*exercise.
3. Output adjusted means for each group.
4. Save analysis results to analysis_results.csv.
5. Create a visualization (boxplot) of LDL by group.
Laboratory Exercise Four: Outcome Interpretation
Artificial Intelligence Tasks: Results Synthesis and Clinical Translation
Artificial Intelligence Generated Components:
- Comprehensive summary tables including baseline characteristics and outcomes by treatment group
- Statistical visualizations including box plots, bar charts, and trend analyses
- Statistical interpretation incorporating p-values, effect sizes, and confidence intervals
- Clinical implications translated for healthcare practitioners
- Manuscript preparation including results section suitable for peer review
Expected Outcome:
Outcome summary table + visualization; AI explains: "Exercise is dominant factor, drug type effect negligible"
Anticipated Clinical Findings:
- Exercise counseling represents the primary determinant of therapeutic response
- Pharmaceutical formulation equivalence demonstrated through non-inferiority analysis
- Safety profiles acceptable across all treatment groups based on hepatic and muscular biomarkers
- Economic implications support transition to generic formulations with confidence
Prompt Example 1 – Natural Language (Summary Table):
You are a medical research assistant.
From analysis_ready.csv, create an outcome summary table with:
- Group (drug × exercise)
- N patients
- Mean baseline LDL
- Mean follow-up LDL
- Mean LDL change
- Mean ALT and CK
- Adherence rate
- Adverse event rate
Format the table for publication (rounded, clear labels).
Prompt Example 2 – Natural Language (Interpretation):
Interpret the results of our study in plain clinical language:
- Which factor had the strongest effect on LDL reduction?
- Did original vs local drug show meaningful differences?
- Were safety markers (ALT, CK) acceptable?
- Summarize in 3–4 sentences for physicians.
Prompt Example 3 – Natural Language (Report Generation) :
Draft the Results section of a research report using the analysis results.
Include:
- Main findings on LDL changes
- Safety observations
- Adherence/AEs
- Clinical implication (exercise importance, drug equivalence)
Make it suitable for a journal submission.
Prompt Example 4 – Python Script (Visualization & Export):
Write a Python script to:
1. Read analysis_results.csv
2. Generate a bar chart of mean LDL reduction by group
3. Generate a line chart of LDL change from baseline to follow-up for each group
4. Save all plots as PNG files
5. Export summary stats into outcome_summary.csv