Data Analysis with Python: A Jupyter Notebook Example
This post demonstrates how to include Jupyter notebook content in your blog. Here’s an example data analysis workflow.
Introduction
In this notebook, we’ll analyze some sample data using Python, pandas, and matplotlib.
Setup and Imports
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
# Set up plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
Load and Explore Data
# Create sample data
np.random.seed(42)
data = {
'date': pd.date_range('2023-01-01', periods=100, freq='D'),
'sales': np.random.normal(1000, 200, 100),
'visitors': np.random.normal(500, 100, 100),
'category': np.random.choice(['A', 'B', 'C'], 100)
}
df = pd.DataFrame(data)
df['conversion_rate'] = df['sales'] / df['visitors']
print("Dataset shape:", df.shape)
df.head()
Output:
Dataset shape: (100, 5)
date sales visitors category conversion_rate
0 2023-01-01 996.714153 426.967649 A 2.334225
1 2023-01-02 881.730204 563.407932 C 1.565011
2 2023-01-03 1291.544472 695.455727 C 1.856892
3 2023-01-04 953.177555 423.586466 A 2.249803
4 2023-01-05 865.408363 544.883183 C 1.588313
Data Visualization
# Create subplots
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Sales over time
axes[0,0].plot(df['date'], df['sales'], alpha=0.7)
axes[0,0].set_title('Sales Over Time')
axes[0,0].set_xlabel('Date')
axes[0,0].set_ylabel('Sales')
# Visitors by category
category_visitors = df.groupby('category')['visitors'].mean()
axes[0,1].bar(category_visitors.index, category_visitors.values)
axes[0,1].set_title('Average Visitors by Category')
axes[0,1].set_xlabel('Category')
axes[0,1].set_ylabel('Average Visitors')
# Conversion rate distribution
axes[1,0].hist(df['conversion_rate'], bins=20, alpha=0.7)
axes[1,0].set_title('Conversion Rate Distribution')
axes[1,0].set_xlabel('Conversion Rate')
axes[1,0].set_ylabel('Frequency')
# Correlation heatmap
correlation_data = df[['sales', 'visitors', 'conversion_rate']].corr()
im = axes[1,1].imshow(correlation_data, cmap='coolwarm', aspect='auto')
axes[1,1].set_title('Correlation Matrix')
axes[1,1].set_xticks(range(len(correlation_data.columns)))
axes[1,1].set_yticks(range(len(correlation_data.columns)))
axes[1,1].set_xticklabels(correlation_data.columns, rotation=45)
axes[1,1].set_yticklabels(correlation_data.columns)
plt.tight_layout()
plt.show()
Statistical Analysis
# Summary statistics
print("Summary Statistics:")
print(df[['sales', 'visitors', 'conversion_rate']].describe())
# Correlation analysis
print("\nCorrelation Matrix:")
print(correlation_data)
Output:
Summary Statistics:
sales visitors conversion_rate
count 100.000000 100.000000 100.000000
mean 973.633870 499.313471 1.969167
std 217.355072 98.063616 0.567890
min 444.175957 274.052987 0.927181
25% 824.018703 434.077389 1.590983
50% 984.016525 499.359181 1.933964
75% 1138.651039 565.743736 2.318605
max 1564.007081 759.375021 3.847052
Correlation Matrix:
sales visitors conversion_rate
sales 1.000000 0.018739 0.894123
visitors 0.018739 1.000000 -0.748925
conversion_rate 0.894123 -0.748925 1.000000
Key Insights
From our analysis, we can observe:
- Sales Trend: Sales show some variability over time with an average of ~974 units
- Category Performance: All categories have similar visitor counts
- Conversion Rate: There’s a strong positive correlation (0.89) between sales and conversion rate
- Visitors Impact: Interestingly, there’s a negative correlation (-0.75) between visitors and conversion rate
Conclusion
This example demonstrates how to present Jupyter notebook content in a blog post format. The combination of code, outputs, and explanations makes for engaging technical content.
You can find the original notebook file in the repository or create your own following this structure.