-
Cloud Storage Levels and Prior Retrieval Strategies
In thinking about cloud storage for radiology, the cost comes when images are retrieved (at least for systems I am familiar with).
I am greatly simplifying here, but bear with me…
There are varying levels or depths of storage, some term these as hot/warm/cold and they reflect not only the speed with which they can be retrieved, but also the cost per/time unit (month?) to keep them there.
The coldest storage would presumably be for items that are not expected to be needed frequently, perhaps only for legal reasons, etc. This would be the cheapest type of storage…
Hot implies rapid retrieval/High Availability, and also higher cost per time unit.
My 2 part question/thoughts revolve around ways to strategize what priors could spend time in the “colder” areas for a given time period, to save money, and then be brought to hot storage closer to when they might be needed, or kept in cold forever.
Presumably this would need to be automated process, perhaps rules driven, or possibly but not necessarily AI
As an example, if someone gets a yearly mammogram at a given facility, we could keep their prior mammos in deeper storage for most of the year and perhaps bring them “closer to the surface”, is it gets closer to time for them to return. Perhaps the retrieval/pre-fetch from cold storage can be triggered by an order, although this may not be enough time? or just a rule that says, we know patient xyz comes every year for a mammo so start getting those mammos into hot storage now….
See where I am going?
I could see facilities saving money with cloud storage given retrieval strategies based on patient type, or exam time, or diagnosis type, etc…
Does anyone know of this type of research being done somewhere?
For an interesting example I found this:
As of January 2022, KHUH had accumulated around one million medical image studies, resulting in around 476 million files with a total data volume of 44 TB and KHUH is adding around 1 TB of new data every month. [b]An assessment of access patterns showed that out of those one million studies, only 94,000 studies from older years (2011-2019) were retrieved in 2021.[/b] These 94,000 studies represented a data volume of 3 TB so only around 7% of historic PACS data is retrieved annually.
I don’t know if those numbers would be the same for a US facility, but just as a rough example….
If we could learn the access patterns for priors, we could save some storage money.