2024-12-12-Thursday
created: 2024-12-12 06:08 tags: - daily-notes
Thursday, December 12, 2024
<< Timestamps/2024/12-December/2024-12-11-Wednesday|Yesterday | Timestamps/2024/12-December/2024-12-13-Friday|Tomorrow >>
🎯 Goal
- [x] Create a Django Management Command that can load data from Markdown files directly from an S3 Bucket into the BigBrain App.
🌟 Results
- Created a Django Management Command that could load notes from S3 Buckets into the BigBrain App in Production.
🌱 Next Time
- Create a scheduled CRON Job with a Celery worker so that updates to these BigBrain App notes are periodically refreshed to reflect their current state in my Local Version.
📝 Notes
Yesterday I worked through uploading my notes from Obsidian to an S3 Bucket using Django Management Commands. I can now upload my notes from a given directory efficiently using the command:
python manage.py upload_obsidian_vault_to_s3 --vault-path "/path/to/vault"
The default vault path can be set in Environment Variables or specified explicitly within the command using the --vault-path
argument.
I asked ChatGPT to create the vault import logic from S3 and it managed to put it together just fine. Before long I was able to push all of my obsidian notes to Production. I'd like to handle the "up to date" logic more efficiently in batches similar to how I did with uploading notes to S3.
Now I can upload notes from S3 to the BigBrain App via
heroku run bash -a dimmin
python manage.py import_obsidian_vault_from_s3 --vault-name "WGU MSDADS"
I can also run it on my Local Version instead of having to run it each time in Production which is nice. Unfortunately on the API call where we get the data:
paginator = s3.get_paginator("list_objects_v2")
files_metadata = {}
for page in paginator.paginate(Bucket=AWS_BUCKET_NAME, Prefix=prefix):
for obj in page.get("Contents", []):
print(obj)
we're returned the following response:
{'Key': '.../AWS Athena.md', 'LastModified': datetime.datetime(2024, 12, 11, 16, 24, 35, tzinfo=tzutc()), 'ETag': '"..."', 'Size': 247, 'StorageClass': 'STANDARD'}
Which includes the LastModified
date (from the S3 perspective, or when the data was loaded in), but not the explicitly defined user Metadata of modificationdate
. This is disappointing, and while ChatGPT does have a workaround where we include this modificationdate
as a part of the tags within our individual files it seems convoluted...
Currently, the filedates are checking to see if the S3 modification date (LastModified
) is more recent than the modificaiton
date to determine if a note needs to be updated. It might actually work if we use the LastModified
date and compare it against the current LastModified
date because we only change modified files in the upload script.
While it's currently not very efficient, least it's cool to see my actual notes (and their links) in the BigBrain App! Next I'll work through scheduling these updates using Celery Workers. It only takes a few moments to execute this command so if it's scheduled to update in the afternoon when I'm done working on my projects for the day it's not a big deal. I've made it into a github issue and I'll get to it when needed.
Notes created today
List FROM "" WHERE file.cday = date("2024-12-12") SORT file.ctime asc
Notes last touched today
List FROM "" WHERE file.mday = date("2024-12-12") SORT file.mtime asc