Fix db_poller fetching so it does not use the saved last_sent_id on i…#156
Open
rzmong wants to merge 1 commit into
Open
Fix db_poller fetching so it does not use the saved last_sent_id on i…#156rzmong wants to merge 1 commit into
rzmong wants to merge 1 commit into
Conversation
…nitial poll and saves the poll time as last_sent
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…nitial poll and saves the poll time as last_sent
Pull Request Template
Description
The
db_pollerperforms polls at regular intervals by referencing thelast_sent_idandlast_sentcolumns stored in thedeimos_poll_infotable. During a poll thedb_pollermay process multiple batches of items with a default batch size of 1000.After the an initial
poll_querythedb_pollerwill save theupdated_atandidof the last record it processed as thelast_sentandlast_sent_idrespectively. i.e.Updates at time of initial poll
deimos_poll_infotable after initial pollOn the next poll the
last_sentis used to determine time bounds (time_from:will be2andtime_to:which will be~time of the second poll) and thelast_sent_idis used as amin_idfor the nextpoll_query(time_from:, time_to:, column_name:, min_id:). If these updates have occurred since the initial poll:Updates at time of second poll
The
poll_querytakes in300as themin_idand wouldn't know it should query for updates across all id values, leading to missing updates for ids100and200.If we can set the
min_idto0when we're starting a new poll interval (i.e. when thebatch_count == 0) then thepoll_querycan trust themin_idto be valid and we can find updates across all id values.Now if we do this, since we've stored
2as thelast_sentthis means that thepoll_querywill look receive atime_fromof2and atime_toof~time of second poll. Thepoll_querycould potentially pull in records from the previous poll withupdated_at=2, now that we query across all id values. To avoid this we can store thetime_toas the value oflast_sentafter we've completed all processing during a poll interval. This should ensure that we don't include already processed records in subsequent polls.So in this PR I've made changes to set the
min_idfor thepoll_queryto0whenbatch_count == 0and I've also stored thetime_tovalue as thelast_sentonce all processing during a poll interval has completed.Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Checklist: