You’re already thinking in a structured and intelligent way about categorizing the data—this will be incredibly useful when converting raw sensor data into meaningful narratives! Now, let’s break down the preprocessing strategy step by step.
You’ve categorized your data into four stability levels:
- Highly Periodic (e.g., solar radiation)
- Statistically Predictable (e.g., air/soil temperature)
- Semi-Chaotic (e.g., precipitation, wind)
- Fully Chaotic (completely unpredictable phenomena)
Each of these categories will require a different type of preprocessing, so the LLM can interpret the data appropriately.
- Strategy: Identify deviations from the usual pattern.
- Processing:
- Compute daily solar radiation curves (e.g., sunrise-to-sunset pattern).
- Detect anomalies: e.g., cloud cover reducing radiation compared to expected.
- Generate insights: “Today had less sunshine than expected, possibly due to overcast conditions.”
- Strategy: Compare against seasonal norms.
- Processing:
- Retrieve historical average for the same day of the year.
- Compute a z-score or percent deviation from expected values.
- Generate insights: “The soil temperature was 3°C higher than the seasonal norm, indicating an early warming trend.”
- Strategy: Detect significant changes and patterns.
- Processing:
- Use rolling averages or moving standard deviations.
- Detect abrupt changes compared to the past 7-30 days.
- Generate insights: “A sudden wind gust of 50 km/h was recorded, breaking the usual calm pattern.”
- Strategy: Just report changes in intensity without expectation-based comparisons.
- Processing:
- Use rate of change rather than trend comparison.
- Generate insights: “An unexpected 40 mm rainfall event occurred in a short period, saturating the soil.”
Since you have both public satellite data and sensor data, you can structure history retrieval like this:
Since your sensor records data every 15 minutes, you get 96 readings per day per sensor. You’ll need to summarize this efficiently before passing it to the LLM.
Depending on the type of data, use different statistical reductions:
| Data Type | Aggregation Strategy |
|---|
| Highly Periodic (Solar, etc.) | Compute daily max, min, and curve fit |
| Statistically Predictable (Temperature, etc.) | Compute mean, min, max, deviation from expected |
| Semi-Chaotic (Wind, Rain) | Compute total, max gust/intensity, variation from previous days |
| Fully Chaotic (Unpredictable Events) | Extract spikes or significant deviations |
| Metric | Min | Max | Mean | Deviation | Notable Event |
|---|
| Air Temp (°C) | 12 | 25 | 18.4 | +2.1°C above avg | Warm spell |
| Solar Radiation (W/m²) | 0 | 850 | 470 | -10% from avg | Cloudy afternoon |
| Wind Speed (km/h) | 5 | 40 | 15 | +20 km/h | Sudden gusts at 14:30 |
| Rainfall (mm) | 0 | 15 | 3 | +50% vs. avg | Short storm at 16:00 |
This kind of structured data ensures that the LLM gets the right context to write meaningful diary entries.
To ensure the LLM focuses on important events rather than routine data, implement change detection methods:
- Rate of Change
- Example: If temperature changes by 5°C+ within an hour, flag as an event.
- Threshold-Based Alerts
- Example: If rainfall exceeds 20mm in an hour, mark as a storm event.
- Comparisons to Recent Trends
- Example: If today’s wind speed is double the weekly average, highlight it.
These event flags will help the LLM focus on what really matters.
Rather than giving the LLM raw numbers, you’ll pass preprocessed insights like this:
- Warm day with temperatures reaching 25°C (+2.1°C above normal).
- Solar radiation slightly below expected due to cloud cover.
- Sudden wind gusts of 40 km/h recorded at 14:30.
- Light rain (15mm), with a brief heavy storm at 16:00.
- This week has been windier than usual (+20% avg increase).
- Compared to last February, temperatures are trending higher.
- Sudden afternoon gusts.
- Short but intense rainfall event.
From this, the LLM can write:
“Today started with warm and stable conditions, but by the afternoon, sudden gusts of wind swept through the area, reaching speeds of 40 km/h. Around 4 PM, a brief but intense rainstorm hit, dropping 15mm of rain in a short period, soaking the soil. The solar radiation was slightly weaker than usual, suggesting partial cloud cover. Overall, it was a dynamic day, with shifting weather patterns compared to previous years.”
✅ Preprocess each category differently (Periodic, Predictable, Semi-Chaotic, Chaotic).
✅ Use historical data to add context (long-term satellite, short-term sensors).
✅ Aggregate 96 daily points into summarized statistics.
✅ Detect significant changes using thresholds & trend analysis.
✅ Pass a structured summary to the LLM rather than raw data.
Would you like me to generate some Python code to handle these preprocessing steps? 🚀