Skip to content

Avoid deleting whole threads when ripping 1 page#13

Open
roelderickx wants to merge 1 commit into
labster:masterfrom
roelderickx:limitdelete
Open

Avoid deleting whole threads when ripping 1 page#13
roelderickx wants to merge 1 commit into
labster:masterfrom
roelderickx:limitdelete

Conversation

@roelderickx
Copy link
Copy Markdown

Hi,

While ripping a thread with more than 20 posts I found only the last messages are saved in the database.

The script downloads averything and the output confirms this, but in below example only the 5 messages with sequences 20 to 24 will be saved in the database:

Gathering data from https://www.tapatalk.com/groups/***/viewtopic.php
looking for thread t=1827&start=0 - downloaded - ***
10 saved
looking for thread t=1827&start=20 - downloaded - ***
5 saved

Reason is the download_thread function which is called recursively and deletes the whole thread every time, I modified it to delete only the posts from $start to $start + $posts_per_page - 1

@roelderickx
Copy link
Copy Markdown
Author

After using the script for a while, it turns out there is more to be done before it works:

  • The login has been changed
  • A page contains only 10 posts in stead of 20
  • The parsing of the username has been changed
  • Bogus and unauthorized threads must be detected differently
  • Fetching the edit count always returns 403 forbidden
  • Some dates cannot be parsed if they are in the current year, eg '11:41 PM - Apr 03'

I resolved all issues above except the last two, I don't consider it to be that important. Do you want me to include all changes I made in this pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant