DIMMiN Notes

My working notes related to the development of dimmin.com.

Clear Filters

2025-11-27-Thursday

Read more

2025-11-23-Sunday

  • Allowed Beta Testers access to the Taskmaster App via the UI (Closed in this PR)
  • Created a weekly progress bar in the UI that shows the user how many days they have been consistent with their task (closed in This PR)
  • Updated this weekly progress bar to be user responsive via AJAX commands

Read more

2025-11-22-Saturday

Read more

2025-11-21-Friday

Read more

2025-10-29-Wednesday

Read more

2025-10-08-Wednesday

  • Found a way to accurately join the competition and lot tables (first on explicitly validated rows where a direct link was present, then on rank and farm_name)
  • Cleaned up competition and lot aggregation code (now there is no explicit script to aggregate this data together)
  • Now have a dataset with ~4.3k valid rows of different coffees
  • Started associating each coffee lot with its country and year
  • Identified a bug where some urls are associated with incorrect lot information in the output of the offline_pipeline_spider.py

Read more

2025-10-06-Monday

  • Added a check that a lot's url feature was in associated_lots before adding it as a feature via the offline_pipeline_spider
  • Validated that most pages with invalid associated lots actually did not have references to lots
  • Identified a bug where there was some duplicate information, combining data where possible to discover 5,317 unique coffee lots (validated with their individual urls) within the competition page
  • A direct Inner Join on between composite_lot_data.csv and score_df reveals 5,251 unique rows (indicating a much higher match rate than I saw initially)
  • Identified (and fixed) a bug with the price_per_unit Float converter code to account for , used to indicate cents.

Read more

2025-10-05-Sunday

  • Found out that the price_per_unit is a good marker of the auction table within the competition page
  • Found an additional 2,000 lot associations for price_per_unit (now at ~4.8k total lots)
  • Changed the format of associated_lot to a list to eliminate redundant data
  • Started using the associated_lot feature to determine which tables on a given page have a direct link to lot-level data
  • Broke trackpad on Elvis so I couldn't use the mouse (this resolved itself on system reboot but was sp00ky).

Read more

2025-10-04-Saturday

  • Normalized total_lot_value_usd and price_per_unit (thankfully the latter is always in \$/lbs which makes things easier)
  • Normalized auction_lot_size_kg (though this one needs some work, integer values here typically indicate a 30kg box of coffee instead of a raw weight of kg which itself is usually to two decimal places)
  • Identified that the most important feature for this analysis will be price_per_unit, therefore it makes the most sense to find features most relevant to this aspect.
  • Finished a rough draft of the processing for this dataset which currently provides 2,444 coffee lots with a known price_per_unit value. Given that we start with 5,482 unique coffee lots identifiable by their unique urls the biggest bottleneck I've identified right now is associating a coffee's lot with its respective auction results in the competition page.
  • Added an association between each competition and its respective lots in the coe_scraper/spiders/offline_pipeline_spider.py pipeline procedure. This should make it much easier to join competition page data to its associated lot.
  • Added a quick validation check within this pipeline to see if the links available in the competition page were also available in my local.

Read more

2025-10-03-Friday

  • Normalized coffee variety, processing_system, and altitude features

Read more