45 AI Prompts for Data Scientists
Data science keeps evolving, and ChatGPT has become a game-changer for how we approach complex analytical tasks. Whether you're cleaning messy datasets, building predictive models, or extracting insights from unstructured data, the right prompts can save you hours of work and spark new ideas you might not have considered.
Best AI Prompts for Data Science
Modern data scientists face increasingly complex challenges - from handling massive datasets to communicating findings across different business units. ChatGPT works as your analytical partner, helping you think through problems systematically and automate routine tasks that used to eat up valuable time.
The prompts below cover real scenarios you'll encounter daily. Some focus on technical implementation, while others help with strategic thinking and communication. Each one is designed to work with your existing workflow, not replace your expertise.
AI Prompts for Automated Data Cleaning and Preprocessing
Real-world data is messy. Missing values, inconsistent formats, and outliers can derail your analysis before it starts. These prompts help you identify cleaning strategies and automate repetitive preprocessing tasks.
You are a data quality expert analyzing a {dataset_description} for {business_context}. Examine the dataset characteristics and provide a structured data quality report that identifies: (1) missing value patterns by column, (2) potential outliers and anomalies, (3) inconsistent formats or values, (4) duplicate records, and (5) data type mismatches. For each issue found, rank its severity (high/medium/low) and suggest specific {programming_language} solutions with estimated effort required.
Acting as a data preprocessing specialist, recommend the optimal missing value handling strategy for {column_types} in a {dataset_description} intended for {target_analysis}. Consider the missing data mechanism (MCAR, MAR, MNAR), the percentage of missingness, and the downstream analysis requirements. Provide {programming_language} code examples for your recommended approach, including validation methods to assess the impact of your imputation strategy on the final analysis.
You are a statistical analyst working with {dataset_description} for {business_context}. Design a multi-layered outlier detection approach that combines statistical methods (Z-score, IQR, isolation forest) with domain-specific rules. For each {column_types}, determine whether outliers should be removed, capped, transformed, or investigated further based on the {target_analysis} goals. Provide {programming_language} implementation with clear decision criteria for each treatment option.
Create a comprehensive data standardization workflow for {dataset_description} that handles {data_issues} systematically. Design {programming_language} functions that automatically: (1) detect and standardize date/time formats, (2) clean and normalize text fields, (3) standardize categorical values, and (4) ensure consistent numerical formats. Include data validation checkpoints and error handling for edge cases commonly found in {business_context} datasets.
You are a data engineer building an automated preprocessing pipeline for {dataset_description} using {tool_preference}. Create a modular, reusable system that handles {data_issues} and prepares data for {target_analysis}. Include configuration parameters for different cleaning strategies, logging for audit trails, data quality metrics calculation, and automated testing to ensure pipeline reliability. Design the system to handle new data batches with minimal manual intervention.
AI Prompts for Feature Engineering
Creating meaningful features often makes the difference between a mediocre model and a breakthrough. Use these prompts when you need fresh perspectives on extracting value from your raw data.
You are a feature engineering specialist tasked with systematically generating candidate features for {dataset_description} to predict {target_variable}. Given existing features {existing_features}, create a comprehensive feature generation plan that includes: statistical transformations (aggregations, ratios, differences), mathematical operations (polynomials, logarithms, binning), and domain-agnostic patterns (frequency encoding, target encoding, clustering features). Organize your suggestions by transformation type and provide the rationale for each category's relevance to {model_type} performance.
Acting as a {domain_context} data scientist, leverage your industry expertise to engineer domain-specific features from {dataset_description} that predict {target_variable}. Consider business rules, seasonal patterns, customer behavior insights, and regulatory requirements specific to this domain. Create features that capture meaningful business concepts using {existing_features}, and explain how each feature relates to real-world {domain_context} dynamics that drive the target outcome.
You are designing time-based features for {dataset_description} containing temporal data to predict {target_variable}. Extract meaningful temporal patterns including: trend features (moving averages, velocity, acceleration), cyclical patterns (seasonal decomposition, day-of-week effects), lag features and rolling statistics, time-since-event calculations, and sequence-based features. Structure your approach to handle {computational_constraints} while maximizing predictive signal for {model_type}.
As an advanced feature engineer, identify and create powerful feature interactions from {existing_features} in {dataset_description} to improve {target_variable} prediction. Focus on discovering non-linear relationships through: multiplicative interactions, conditional features (if-then logic), ratio-based combinations, and polynomial features. Prioritize interactions most likely to benefit {model_type} and provide feature importance ranking methodology to validate the created interactions.
You are optimizing feature engineering specifically to achieve {performance_goal} for predicting {target_variable} using {model_type} on {dataset_description}. Design targeted features that directly address your performance objective by analyzing current model weaknesses, creating features that improve decision boundaries, reducing noise through strategic transformations, and engineering features that enhance model interpretability when needed. Provide a feature validation framework to measure impact on your specific performance metric.
AI Prompts for Model Selection and Hyperparameter Tuning (AutoML)
With so many algorithms available, choosing the right approach can be overwhelming. These prompts guide you through systematic model comparison and optimization strategies.
You are an ML engineering consultant selecting optimal algorithms for a new project. Given a {problem_type} task with {dataset_size} and {num_features}, recommend the top 4 most suitable algorithms ranked by expected performance. For each recommendation, explain its suitability for this data profile, estimated training time, and interpretability level. Structure as: Algorithm name, suitability score (1-10), key strengths, potential limitations.
You are an AutoML expert optimizing {algorithm_name} for {problem_type} within {time_constraint}. Create a strategic hyperparameter search plan prioritizing the most impactful parameters first. Recommend search method (grid/random/Bayesian), provide optimal search ranges, and design a step-by-step tuning sequence that maximizes {target_metric} improvement per hour invested.
You are conducting a rigorous model comparison study. Design an experimental protocol to fairly evaluate {num_algorithms} algorithms for {problem_type} with {dataset_size}. Include standardized preprocessing, cross-validation strategy, comprehensive metrics beyond {target_metric}, and statistical significance testing. Create a decision matrix template capturing performance, efficiency, interpretability, and deployment complexity.
You are an ML consultant with limited {compute_resources} and {time_constraint} for {problem_type}. Recommend the highest ROI approach prioritizing algorithms with the best performance-to-effort ratio. Suggest time-saving techniques, identify which hyperparameters to tune versus accept defaults, and create a rapid prototyping workflow that delivers production-ready results within constraints.
You are an ML solutions architect for {business_context} requiring {interpretability_need} and {deployment_constraint}. Evaluate suitable algorithms for {problem_type} across technical performance ({target_metric}), business requirements (compliance, explainability, latency), and operational factors (maintenance, monitoring, scalability). Provide implementation roadmap balancing stakeholder needs with technical excellence.
AI Prompts for Predictive Modeling and Forecasting
Business stakeholders need reliable predictions they can act on. These prompts help you build robust forecasting models and communicate uncertainty effectively.
You are a data scientist building a forecasting model for {target_variable} over {time_horizon} for a {business_context} company. Using {available_data}, recommend the most appropriate modeling approach, explain why it fits this specific use case, and outline the key steps for implementation. Include model selection criteria that balance accuracy with interpretability for business stakeholders.
You are presenting forecasting results to {stakeholder_audience} who need to make critical business decisions about {target_variable} for {time_horizon}. Translate the statistical uncertainty (confidence intervals, prediction ranges) into business-friendly language, explain what different scenarios mean for their planning, and provide clear guidance on how to use these predictions despite inherent uncertainty.
You are preparing predictive features for forecasting {target_variable} in a {business_context} environment, considering {key_factors} and {business_constraints}. Identify the most valuable data sources and feature engineering techniques, prioritize features by expected impact, and create a systematic approach for feature validation that aligns with the {time_horizon} prediction window.
You are converting forecasting model outputs for {target_variable} into actionable business recommendations for {stakeholder_audience}. Transform the statistical predictions into specific operational guidance, quantify the business impact of different scenarios, and provide clear decision frameworks that account for {business_constraints} and help achieve measurable business outcomes.
You are validating a forecasting model for {target_variable} that will be used for {time_horizon} planning in a {business_context} setting. Design a comprehensive testing strategy that includes out-of-sample validation, stress testing against {key_factors}, and performance monitoring protocols. Ensure the model maintains reliability across different business conditions and provide early warning systems for model degradation.
AI Prompts for Natural Language Processing (NLP) for Text Analysis
Unstructured text contains valuable insights, but extracting them requires the right approach. Use these prompts when working with customer feedback, social media data, or document analysis.
You are a data analyst specializing in customer experience insights. Analyze the following {business_type} customer feedback data: {text_data} collected over {time_period}. Focus specifically on {specific_focus} and provide: (1) overall sentiment breakdown with percentages, (2) top 5 recurring themes with supporting quotes, (3) actionable recommendations for improvement, and (4) priority issues that need immediate attention.
You are a brand monitoring specialist tracking {brand_name} mentions across {platform}. Analyze these social media posts: {social_media_posts} in the context of {campaign_context}. Provide a comprehensive sentiment analysis including: (1) sentiment distribution (positive/negative/neutral percentages), (2) key topics driving each sentiment, (3) influential posts or trending conversations, and (4) recommended response strategy for negative sentiment.
You are a market research analyst processing open-ended survey responses about {survey_topic} from {target_audience}. Analyze these responses: {survey_responses} to inform {business_goal}. Deliver: (1) thematic categorization with response counts, (2) sentiment analysis by theme, (3) notable verbatim quotes representing each theme, and (4) strategic insights that directly address the business goal.
You are an information extraction specialist working with {document_type} documents. Extract and analyze the following content: {document_text} focusing on {extraction_focus}. Structure your analysis in {output_format} and include: (1) key information organized by relevance, (2) important dates, numbers, or entities identified, (3) potential risks or opportunities highlighted, and (4) summary of critical points that require action or follow-up.
You are a competitive intelligence analyst in the {industry} sector. Analyze this content from {competitor_name}: {competitor_content} to uncover strategic insights about {analysis_focus}. Provide: (1) key strategic themes and messaging patterns, (2) competitive positioning and differentiation claims, (3) market opportunities or threats revealed, and (4) actionable intelligence for strategic planning with supporting evidence from the text.
AI Prompts for Synthetic Data Generation
When real data is limited or privacy is a concern, synthetic data can fill the gap. These prompts guide you through generating realistic datasets that preserve statistical properties.
You are a data scientist specializing in synthetic data generation. Create a realistic synthetic dataset for {data_type} with {record_count} records containing {key_attributes}. Preserve the statistical distributions and correlations from this sample: {original_data_sample}. Generate the data in {output_format} format, ensuring {specific_constraints} are maintained, and include validation steps to verify statistical similarity to the original dataset.
You are a privacy engineer creating {compliance_type}-compliant synthetic data for {industry_domain}. Generate {record_count} synthetic {data_entity} records that maintain statistical utility while ensuring zero re-identification risk. The synthetic data should preserve {key_statistical_properties} from the original dataset, include realistic {domain_specific_fields}, and pass differential privacy tests with epsilon value of {privacy_budget}.
You are a time series analyst generating synthetic {data_source} data spanning {time_period} with {sampling_frequency} intervals. Create realistic synthetic data that preserves seasonal patterns, trends, and anomaly distributions observed in {original_pattern_description}. Include {environmental_factors} that influence the patterns, ensure temporal consistency, and generate both normal operation data and {anomaly_types} for comprehensive testing scenarios.
You are a machine learning engineer creating synthetic training data for a {ml_task_type} model predicting {target_variable}. Generate {additional_record_count} synthetic samples that expand the diversity of the existing dataset while maintaining class balance for {class_distribution}. Focus on creating edge cases and underrepresented scenarios in {specific_feature_spaces}, and ensure the synthetic data improves model robustness without introducing bias.
You are a systems engineer generating realistic synthetic data for load testing {system_type}. Create {data_volume} of synthetic {transaction_type} data that mimics real user behavior patterns including {usage_patterns}. The data should include realistic {data_attributes}, follow {business_rules}, simulate {peak_load_scenarios}, and enable testing of system performance under {stress_conditions} while maintaining data referential integrity.
AI Prompts for Automated Report Generation and Visualization
Turning analysis into actionable insights requires clear communication. These prompts help you create compelling visualizations and reports that resonate with different audiences.
You are a data visualization consultant creating a {report_type} for {audience}. Using {data_source}, identify the 3-5 most critical {key_metrics} for {time_period} that directly impact {business_objective}. Create a dashboard structure with clear visual hierarchy, executive summary insights, and specific recommendations that can be acted upon within 30 days.
You are a data analyst preparing an operational report for {audience} using {data_source}. Generate a comprehensive analysis of {key_metrics} over {time_period}, including trend identification, anomaly detection, and root cause analysis. Structure your findings with technical details, statistical significance, and actionable optimization recommendations for {business_objective}.
You are a client success manager creating a {report_type} presentation for {audience}. Transform {data_source} into a compelling narrative that highlights {key_metrics} achievements during {time_period}. Focus on ROI demonstration, progress toward {business_objective}, and strategic next steps, using visual storytelling that builds confidence and drives continued partnership.
You are a compliance analyst generating a formal {report_type} for {audience} using {data_source}. Create a structured report covering {key_metrics} for {time_period} that meets regulatory standards for {business_objective}. Include methodology explanations, data validation procedures, and ensure all visualizations follow industry compliance guidelines with proper documentation trails.
You are a marketing analyst evaluating campaign performance using {data_source} for {audience}. Analyze {key_metrics} across {time_period} to measure success against {business_objective}. Create visual comparisons, attribution analysis, and budget allocation recommendations with clear before/after comparisons and forecasted impact for future campaigns.
AI Prompts for Anomaly Detection
Spotting unusual patterns can prevent fraud, identify system failures, or uncover hidden opportunities. Use these prompts to build robust detection systems and investigate anomalies effectively.
You are a fraud detection specialist designing a monitoring system for {industry} transactions. Analyze the provided {data_type} from {time_period} and create a detection framework that identifies suspicious patterns while maintaining {threshold_level} false positive rates. Provide specific rules, statistical thresholds, and real-time triggers that would flag {anomaly_type} for immediate investigation, including the business justification for each detection criterion.
Acting as a system reliability engineer, investigate unusual patterns in {data_type} over {time_period} compared to {baseline_period} baseline performance. Identify the top 3 most critical anomalies, rank them by potential business impact, and provide a systematic troubleshooting approach for each. Include specific metrics to monitor, escalation triggers, and preventive measures to avoid similar issues.
You are a business intelligence analyst looking for growth opportunities hidden in unusual patterns. Examine {data_type} from {time_period} to identify positive anomalies that could represent untapped market potential, emerging trends, or operational improvements. For each opportunity identified, provide the potential business value, implementation complexity, and recommended next steps to capitalize on the insight.
As a data forensics expert, perform a comprehensive root cause analysis on the detected {anomaly_type} in {data_type} during {time_period}. Use a systematic approach to trace the anomaly back to its source, identify contributing factors, assess the scope of impact, and determine whether this represents a systemic issue requiring process changes or an isolated incident needing specific remediation.
You are designing an intelligent anomaly detection system for {industry} that processes {data_type} using {detection_method} approaches. Create a framework that automatically adapts to seasonal patterns, reduces false positives over time, and scales with data volume growth. Include specific algorithms, feedback loops for continuous learning, performance metrics for system effectiveness, and integration points with existing {industry} workflows.
AI Prompts for Computer Vision for Image/Video Analysis
Visual data analysis opens up new possibilities across industries. These prompts help you tackle image classification, object detection, and video analysis challenges.
You are a computer vision engineer designing an object detection solution for {industry_domain}. Create a comprehensive implementation plan to detect {objects_to_detect} in {image_type} with {performance_requirements}. Include model architecture recommendations, data preprocessing steps, training strategy for {dataset_size}, and deployment considerations for {technical_constraints}. Provide specific metrics to track and potential challenges with mitigation strategies.
Acting as a manufacturing AI specialist, develop an automated quality control system that analyzes {image_type} to identify {objects_to_detect} with {performance_requirements} accuracy. Design the complete workflow including image acquisition setup, preprocessing pipeline, defect classification categories, and {output_format} for production teams. Address lighting conditions, camera positioning, and integration with existing {technical_constraints}.
You are a medical AI researcher creating a diagnostic support tool for analyzing {image_type} to detect {objects_to_detect}. Design a robust analysis pipeline that meets {performance_requirements} while handling {dataset_size} training samples. Include data augmentation strategies, model interpretability features for clinicians, regulatory compliance considerations, and {output_format} that supports medical decision-making in {industry_domain}.
As a machine learning optimization expert, analyze and improve an existing computer vision model that processes {image_type} for {objects_to_detect} detection. The current system needs to achieve {performance_requirements} while operating under {technical_constraints}. Provide specific optimization techniques for model architecture, data pipeline efficiency, inference speed, and resource utilization. Include A/B testing methodology and performance monitoring strategies.
You are a video analytics architect building a real-time analysis system for {image_type} streams to track {objects_to_detect} across frames. Design an end-to-end pipeline that processes video data to meet {performance_requirements} while managing {technical_constraints}. Include temporal consistency methods, batch processing optimization, storage strategies for {dataset_size}, and {output_format} for stakeholders in {industry_domain}.
Conclusion
These ChatGPT prompts for data scientists make it easier to approach their daily challenges. From automating tedious data cleaning tasks to generating creative feature engineering ideas, AI assistance helps you focus on high-value analytical thinking rather than routine implementation.
The key is using these prompts as starting points for your own analysis. Each dataset and business problem is unique, so adapt the suggestions to fit your specific context. With practice, you'll develop your own prompt library that accelerates your workflow and enhances your analytical capabilities.
Start with the use cases most relevant to your current projects, then gradually incorporate prompts from other areas as your needs evolve. Your future self will thank you for the time saved and insights gained.
Also check out our best prompts for data engineers.
Try this prompt template
- Fill in the prompt variables
- Copy the prompt
- Go to ChatGPT
- Paste the prompt and get an answer
- Rate the prompt here to help others Soon