-
-
Notifications
You must be signed in to change notification settings - Fork 1k
fix: rasterize node document leakage with hashmap and eq check #3550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: rasterize node document leakage with hashmap and eq check #3550
Conversation
| pub image_data: Vec<(u64, Image<Color>)>, | ||
| /// HashMap for O(1) lookup: content_hash -> (id, full_image_for_collision_check) | ||
| /// Used in deduplicating images efficiently | ||
| image_lookup: HashMap<u64, Vec<(u64, Image<Color>)>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ends up storing the image data twice? I don't think that this is desirable since images can have large memory requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmnmn, think I'll have the map reference the vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't be allowed due to rusts ownership rules, but you can store an Arc<Vec<…>> in both. But I'm a bit confused why you have to do this at all, I would expect that just storing (u64, Image<Color>) in a HashSet should be all that we need, right? The hash set / hash map already implement the eq check on hash collisions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes total sense - was redundantly reimplementing what HashMap does automatically. One follow-up question though , went with the manual approach because I was trying to preserve source_node_id when available (to avoid regenerating IDs for existing images). Your solution generates new UUIDs on every HashMap miss , is the source node Id not relevant here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image source is just cheaper to compute if it exists, but if it does not exist the code should behave the same way. If the source ids are the same, we know that the images will also be the same without having to check them. We could also extend this with a second hashmap from source_id to image id and always first try to find the source id in the source_id hashmap (if we have a source id) and if not look up the full image. That should give us optimal perf characteristics, but I think I still prefer this simpler solution if the perf is good enough
|
Alright, I've now pushed another suggestion. If the perf of this is acceptable, I think this is going to be the simplest solution which is hopefully also relatively easy to maintain. @0HyperCube thoughts? |
|
This is so much more cleaner , wow |
Second approach for fixing this bug . Will be tested and benchmarked against #3539 to see if this is better
Closes #3533