add deduplication of types#1004

Open

walter9388 wants to merge 8 commits intoasottile:mainfrom

walter9388:deduplicate-types

walter9388 commented Feb 20, 2025 •

edited

Loading

Relating to #982

Took me a while to get round to this, but here we are....

I think there are three different levels of approaching this issue:

Remove extra None types only:

-def f(x: Optional[Union[int, None]]): pass
+def f(x: int | None): pass
-def g(x: Union[Optional[int], None]): pass
+def g(x: int | None): pass

Remove any duplicated scalar types at the same depth by name in Union blocks:

-def f(x: Union[Union[Union[Union[a, b], c], d], a]): pass
+def f(x: a | b | c | d): pass
-def g(x: Union[a.b | a.c, a.b, list[str], str]): pass
+def g(x: a.b | a.c | list[str] | str): pass

General deduplication at any depth on any block:

-def f(x: Union[list[Union[int, str]], list[Union[str, int]]]): pass
+def f(x: list[int | str]): pass

I settled on level 2 as this was still possible with a single pass and seemed more useful than just focusing on None types.
I couldn't see a way of approaching the general problem (level 3) without recursively making a tree structure and then assessing the leaf nodes. However, I am not deeply familiar with the standard python libraries for parsing ASTs etc., so if there are simple built in methods for problems like this I would be interested to know!

I used the existing scan in _fix_union to determine the delimitators at depth==1 between the types. This seemed to work well, but I definitely ran into some interesting edge cases when it came to handling comments, whitespace and multilines.

I have managed to get this working for a variety of test cases, and I would be interested to hear your feedback.

Btw I enjoy your YouTube content! I have learnt a lot of niche things I would have struggled to pick up otherwise. So thank you for that!

walter9388 and others added 4 commits

February 20, 2025 14:53


          initial effort

eca6277


          clean up

dc9d5d9


          spelling

1fae34a


          [pre-commit.ci] auto fixes from pre-commit.com hooks

90dfcab

for more information, see https://pre-commit.ci

asottile reviewed

View reviewed changes

Owner

asottile left a comment

just a quick first pass -- will look more closely later

tests/features/typing_pep604_test.py Outdated Show resolved Hide resolved

pyupgrade/_token_helpers.py Outdated Show resolved Hide resolved

walter9388 added 4 commits

February 22, 2025 20:03


          moved modified helper function into typing_pep604.py

1316d66


          fixed bad recursive typing xfail test

2bbe9d7


          moved other modified helper function into typing_pep604.py

2a064a8


          fixed flake8 failures

94931ad

walter9388 commented

View reviewed changes

Author

walter9388 left a comment

Just highlighting a few things to be aware of.

pyupgrade/_plugins/typing_pep604.py

               def _fix_optional(i: int, tokens: list[Token]) -> None:
                   j = find_op(tokens, i, '[')
-                  k = find_closing_bracket(tokens, j)
+                  k, contains_none = _find_closing_bracket_and_if_contains_none(tokens, j)

Author

walter9388 Feb 22, 2025

Modified the general find_closing_bracket function to also check for whether the optional block already contains None in the same pass.

pyupgrade/_plugins/typing_pep604.py

Comment on lines +30 to +34

+                          tokens[k:k + 1] = [
+                              Token('UNIMPORTANT_WS', ' '),
+                              Token('CODE', '| '),
+                              Token('CODE', 'None'),
+                          ]

Author

walter9388 Feb 22, 2025

The reason for changing the single token containing | None to explicit whitespace, | and None is for the deduplication and whitespace removal functions used in _fix_union. This also applies to the multiline version a few lines below.

Owner

asottile Apr 19, 2025

it might be helpful to do all the tokens then -- | would be 'OP' and None would be 'NAME'

pyupgrade/_plugins/typing_pep604.py

Comment on lines +137 to +139

+                      to_delete += _remove_consecutive_unimportant_ws(
+                          tokens, [x for x in range(j, k) if x not in to_delete],
+                      )

Author

walter9388 Feb 22, 2025

Not convinced this is the best approach to remove whitespace, but not sure about what to do in situations where lines are completely deleted other than comments. I have written a niche test for this situation in test id='duplicated types in multi-line nested unions or optionals'.

asottile reviewed

View reviewed changes

pyupgrade/_plugins/typing_pep604.py

               def _fix_optional(i: int, tokens: list[Token]) -> None:
                   j = find_op(tokens, i, '[')
-                  k = find_closing_bracket(tokens, j)
+                  k, contains_none = _find_closing_bracket_and_if_contains_none(tokens, j)

Owner

asottile Apr 19, 2025

typically I add helper functions above where they're called rather than below

asottile reviewed

View reviewed changes

pyupgrade/_plugins/typing_pep604.py

+              ) -> list[int]:
+                  to_delete = []
+                  prev_name = ''
+                  for kk in idxs:

Owner

asottile Apr 19, 2025

why kk?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet