Cloud Storage Levels and Prior Retrieval Strategies

PACS Forum

Cloud Storage Levels and Prior Retrieval Strategies

Posted by vascular28_304 on August 8, 2023 at 12:56 pm

In thinking about cloud storage for radiology, the cost comes when images are retrieved (at least for systems I am familiar with).
I am greatly simplifying here, but bear with me…
There are varying levels or depths of storage, some term these as hot/warm/cold and they reflect not only the speed with which they can be retrieved, but also the cost per/time unit (month?) to keep them there.
The coldest storage would presumably be for items that are not expected to be needed frequently, perhaps only for legal reasons, etc. This would be the cheapest type of storage…

Hot implies rapid retrieval/High Availability, and also higher cost per time unit.

My 2 part question/thoughts revolve around ways to strategize what priors could spend time in the “colder” areas for a given time period, to save money, and then be brought to hot storage closer to when they might be needed, or kept in cold forever.
Presumably this would need to be automated process, perhaps rules driven, or possibly but not necessarily AI

As an example, if someone gets a yearly mammogram at a given facility, we could keep their prior mammos in deeper storage for most of the year and perhaps bring them “closer to the surface”, is it gets closer to time for them to return. Perhaps the retrieval/pre-fetch from cold storage can be triggered by an order, although this may not be enough time? or just a rule that says, we know patient xyz comes every year for a mammo so start getting those mammos into hot storage now….
See where I am going?
I could see facilities saving money with cloud storage given retrieval strategies based on patient type, or exam time, or diagnosis type, etc…
Does anyone know of this type of research being done somewhere?

For an interesting example I found this:
As of January 2022, KHUH had accumulated around one million medical image studies, resulting in around 476 million files with a total data volume of 44 TB and KHUH is adding around 1 TB of new data every month. [b]An assessment of access patterns showed that out of those one million studies, only 94,000 studies from older years (2011-2019) were retrieved in 2021.[/b] These 94,000 studies represented a data volume of 3 TB so only around 7% of historic PACS data is retrieved annually.

I don’t know if those numbers would be the same for a US facility, but just as a rough example….
If we could learn the access patterns for priors, we could save some storage money.

vascular28_304 replied 8 months ago 4 Members · 8 Replies
8 Replies

ggaspar

Member
August 9, 2023 at 4:10 am

I believe the major cloud storage vendors are already coming up viable solutions. The following is a section of an article I came across earlier.

Cost-effective storage for medical images

HealthImaging provides cost-effective storage that can reduce the total cost of ownership for storing new data and image archives of any size. HealthImaging offers a Frequent Access storage tier for new and frequently accessed data, and an Archive Instant Access tier that is cost-effective for infrequently accessed data. Data stored for more than 30 days is automatically moved to the archive tier. The behavior is similar to the Amazon Simple Storage Service (Amazon S3) Intelligent-Tiering storage class, passing on cost savings to customers.

Both storage tiers of HealthImaging support data retrieval in milliseconds. Every image frame stored in HealthImaging can be accessed and rendered with sub-second latency, reducing the need for customers to stage data in expensive block storage volumes.
- vascular28_304
  
  Member
  August 9, 2023 at 2:57 pm
  
  Yes, but what I’m asking about is What is the most Frequently Accessed data….with respect to Priors?
  How do you know which tier to put what in?
  - vascular28_304
    
    Member
    August 9, 2023 at 3:01 pm
    
    Yes, but what I’m asking about is What is the most Frequently Accessed data….with respect to Priors?
    How do you know which tier to put what in?
    - jonhanse_770
      
      Member
      August 10, 2023 at 6:03 am
      
      Pragmatically just move all the images all to the Cloud. This will also provide the backup required by CMS so yuou continue to get Medicare reimbursemnt. It may cost a bit more initially but the savings in aggravation will offest that cost many times over. There are many companies who perform this service beyond HealthImaging although HI is used by several of the OEM providers. Just do a search for image data migration and they will pop up.
      
      If you want to discuss this further give me a call at (407) 247-7345. Good luck.
      
      Mike Cannavo “PACSMan”
      - Unknown Member
        
        Deleted User
        August 31, 2023 at 6:30 am
        
        !vendor alert!
        I work for Amazon Web Services and lead the development of AWS HealthImaging. AWS HealthImaging supports automatic storage tiering – data that is not being used is automatically moved to the lower cost storage tier. The data access speed is the same regardless of what tier it is on. This eliminates the need to “prefetch” data from one tier to another. I am personally very excited about this feature as our research revealed that there is no 100% reliable algorithm to prefetch data. Clinicians had very negative experiences when data wasn’t prefetched. This lead us to find a way to eliminate the prefetch problem entirely. You can read more about AWS HealthImaging’s automatic storage tiering here:
        
        [link=https://docs.aws.amazon.com/healthimaging/latest/devguide/understanding-storage-tiers.html]https://docs.aws.amazon.c…ing-storage-tiers.html[/link]
        
        vascular28_304
        
        Member
        September 1, 2023 at 11:56 am
        
        Chris,
        That’s all very interesting. I think it will save money, but I have to ask, how do you “know” which data isn’t being used? Radiologists will want to look at priors, that may have been acquired years before. Just because a study may be 30 days old doesn’t mean it’s not “being used’. Do you use any predictive algorithms to determine which priors for which types of patients need to remain on the “closer” tier than the others? For example, with mammo, a patient may have been getting their yearly screening for the past 10 years. Then, when there’s a finding in the current screener, a radiologist may want to compare it with and view the baseline, that was performed 10 years go, along with the most recent couple of years. My point being that just because something is “older” doesn’t mean it’s being used. How does AWS account for that?
        Thank you for the response it was very informative.
        
        Unknown Member
        
        Deleted User
        September 5, 2023 at 12:06 pm
        
        We don’t have to know when the data might be used because it is always available at the same speed. This is a feature unique to AWS HealthImaging that is similar to S3 Intelligent Tiering but optimized for medical imaging use cases (specifically -data is ALWAYS available without having to ever prefetch).
        
        vascular28_304
        
        Member
        September 5, 2023 at 12:38 pm
        
        That makes sense, thank you very much for your response.

Cloud Storage Levels and Prior Retrieval Strategies

ggaspar

vascular28_304

vascular28_304

jonhanse_770

Unknown Member

vascular28_304

Unknown Member

vascular28_304