DIMMiN Notes

2024-12-12-Thursday

Created Dec. 12, 2024, 2:08 p.m. Modified Dec. 12, 2024, 4:09 p.m.

created: 2024-12-12 06:08 tags: - daily-notes

Thursday, December 12, 2024

<< Timestamps/2024/12-December/2024-12-11-Wednesday|Yesterday | Timestamps/2024/12-December/2024-12-13-Friday|Tomorrow >>

🎯 Goal

[x] Create a Django Management Command that can load data from Markdown files directly from an S3 Bucket into the BigBrain App.

🌟 Results

Created a Django Management Command that could load notes from S3 Buckets into the BigBrain App in Production.

🌱 Next Time

Create a scheduled CRON Job with a Celery worker so that updates to these BigBrain App notes are periodically refreshed to reflect their current state in my Local Version.

📝 Notes

Yesterday I worked through uploading my notes from Obsidian to an S3 Bucket using Django Management Commands. I can now upload my notes from a given directory efficiently using the command:

python manage.py upload_obsidian_vault_to_s3 --vault-path "/path/to/vault"

The default vault path can be set in Environment Variables or specified explicitly within the command using the --vault-path argument.

I asked ChatGPT to create the vault import logic from S3 and it managed to put it together just fine. Before long I was able to push all of my obsidian notes to Production. I'd like to handle the "up to date" logic more efficiently in batches similar to how I did with uploading notes to S3.

Now I can upload notes from S3 to the BigBrain App via

heroku run bash -a dimmin
python manage.py import_obsidian_vault_from_s3 --vault-name "WGU MSDADS"

I can also run it on my Local Version instead of having to run it each time in Production which is nice. Unfortunately on the API call where we get the data:

paginator = s3.get_paginator("list_objects_v2")
    files_metadata = {}
    for page in paginator.paginate(Bucket=AWS_BUCKET_NAME, Prefix=prefix):
        for obj in page.get("Contents", []):
            print(obj)

we're returned the following response:

{'Key': '.../AWS Athena.md', 'LastModified': datetime.datetime(2024, 12, 11, 16, 24, 35, tzinfo=tzutc()), 'ETag': '"..."', 'Size': 247, 'StorageClass': 'STANDARD'}

Which includes the LastModified date (from the S3 perspective, or when the data was loaded in), but not the explicitly defined user Metadata of modificationdate. This is disappointing, and while ChatGPT does have a workaround where we include this modificationdate as a part of the tags within our individual files it seems convoluted...

Currently, the filedates are checking to see if the S3 modification date (LastModified) is more recent than the modificaiton date to determine if a note needs to be updated. It might actually work if we use the LastModified date and compare it against the current LastModified date because we only change modified files in the upload script.

While it's currently not very efficient, least it's cool to see my actual notes (and their links) in the BigBrain App! Next I'll work through scheduling these updates using Celery Workers. It only takes a few moments to execute this command so if it's scheduled to update in the afternoon when I'm done working on my projects for the day it's not a big deal. I've made it into a github issue and I'll get to it when needed.

Notes created today

List FROM "" WHERE file.cday = date("2024-12-12") SORT file.ctime asc

Notes last touched today

List FROM "" WHERE file.mday = date("2024-12-12") SORT file.mtime asc

(Template referenced from Dann Berg, can be found here)

Previous Note 2024-12-11-Wednesday Next Note 2024-12-13-Friday