🔬 Active Research Project

The Mean Kitchen

A data science project that analyzes statistical associations in recipes to create "mean recipes" - statistically-derived recipes based on analysis of thousands of real recipes.

Currently analyzed

9,922 recipes

Project Overview

The Mean Kitchen (TMK) combines quantitative ingredient analysis with pattern-based instruction analysis to generate insights about cooking patterns and recipe optimization.

The project uses a two-pronged approach: analyzing ingredient frequencies and quantities to find statistical patterns, while also examining cooking instructions to identify common techniques and sequences.

TMK has successfully processed and analyzed over 9,922 recipes from the Food.com dataset, extracting statistical insights from 1,350+ unique ingredients and identifying 2,365+ significant ingredient correlations.

Capabilities

Comprehensive analysis tools for recipe data

📊

Quantitative Analysis

Statistical analysis of ingredient frequencies, quantities, and correlations across thousands of recipes

🔬

Pattern Recognition

Pattern-based instruction analysis to identify common cooking techniques and sequences

🎯

Statistical Recipes

Generate mean, median, and mode recipes based on analysis of large recipe datasets

📈

Data Visualization

Interactive charts and graphs showing ingredient relationships and cooking patterns

🗄️

Large Dataset Processing

Successfully analyzed 9,922+ recipes from Food.com with 89.2% success rate

🤖

ML Integration

Machine learning for outlier detection and recipe recommendations

Sample Results

Statistical Dessert Recipe (based on 2,253 recipes)

  • Sugar: 74.1% frequency (appears in 1,669 recipes)
  • Butter: 56.1% frequency (appears in 1,263 recipes)
  • Eggs: 48.2% frequency (appears in 1,085 recipes)
  • Strongest correlation: Nutmeg + Cinnamon (0.389 correlation coefficient)
  • Average cooking time: 62 minutes
  • Confidence score: 100% (high data quality)
9,922
Recipes Analyzed
1,350+
Unique Ingredients
2,365+
Correlations Found
89.2%
Import Success Rate

Technology Stack

Python R PostgreSQL pandas NumPy scikit-learn spaCy SQLAlchemy Plotly Jupyter