Skip to content

Add preprocessing functions#205

Open
sjavis wants to merge 21 commits into
mainfrom
preprocessing
Open

Add preprocessing functions#205
sjavis wants to merge 21 commits into
mainfrom
preprocessing

Conversation

@sjavis

@sjavis sjavis commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Adds preprocessing functions for all of the topics in the 'data preprocessing' page of the docs. Intended for use with the batching (#194). Generally the functions are designed to accept either file paths or cf.Field inputs and return single or lists of cf.Field. Each function can also write an output file by providing the output_file argument.

  • Preprocessing utility functions in src/tctrack/preprocessing.py
  • Tests
  • Documentation & add to the 'data preprocessing' page.
  • Replace preprocessing in tutorial
  • Add checks that esmpy is installed for regridding. Update the docs to reflect that this is now a "proper dependency".

Closes #189

@sjavis sjavis self-assigned this Jun 1, 2026
@sjavis sjavis marked this pull request as ready for review June 2, 2026 10:01

@MarionBWeinzierl MarionBWeinzierl left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a quick look and some comments -- @surbhigoel77 , could you have a more thorough look through this PR?

Comment thread tutorial/preprocess_data.py
Comment thread tests/unit/preprocessing/test_preprocessing.py Outdated
Comment thread docs/data/preprocessing_data.rst Outdated
@sjavis sjavis linked an issue Jun 9, 2026 that may be closed by this pull request
@sjavis sjavis removed a link to an issue Jun 10, 2026
@sjavis sjavis requested a review from surbhigoel77 June 17, 2026 08:06
Return cf.Field objects instead of size-1 lists.
Also allow size-1 lists to be passed as inputs to functions that expect
single fields.
@sjavis

sjavis commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

In fc12ab3 I have also added a correction for what I suspect is an error in cf.curl_xy that causes the result to be the negative of the actual value. I will add an issue to cf-python about this when I get time.

This seems to resolve the issue in #156 when combined with updates to TSTORMS. This will be done in another PR.

@sjavis sjavis mentioned this pull request Jun 18, 2026
list[cf.Field]
The list of fields read from the input files.
"""
fields = {field.nc_get_variable(): field for field in read_files(input_files)}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add a check before we build the dict that raises ValueError error if there is a missing variable name

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm misunderstanding but it is already checking if any output variable names are missing in the loop below.

I don't think it would be easy to do it before building the dict because it is only after reading the input files that we can know which variables are available.

# Negate the curl due to a suspected error in cf.curl_xy for spherical polar coords
# (In the first term the gradient is taken wrt latitude, not theta)
# (The second term is not negated)
curl.data = -curl.data

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite clear on why we are negating the curl?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this formula of curl :
curl(u, v) = (1/r) * dv/dλ − (1/r·cosφ) * d(u·cosφ)/dφ

As per your comment, only first term gradient needs negation due to the opposite direction of latitude and theta. Why would we negate the entire curl?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf.curl_xy uses the spherical polar form of the equation:

$$\frac{1}{r \sin\theta} \left( \frac{\partial (f_\phi \sin\theta)}{\partial \theta} - \frac{\partial f_\theta}{\partial \phi} \right)$$

So the first term is negative because the derivative is taken in the code wrt latitude rather than $\theta$ as already stated.

For the second term, I mistakenly thought it had been added rather than subtracted. The actual issue is that $f_\theta$ should be the component of $f$ in the southward direction. So $f_\theta = -v$. I've modified the comment to reflect this.

So overall it is calculating the negative of the curl, hence the need for the negation.

I will open an issue on cf-python regarding this.

@MarionBWeinzierl MarionBWeinzierl left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy from my part, but leave it to @surbhigoel77 to accept this PR after she has done another review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data preprocessing functions

3 participants