Poverty example by Zathimo · Pull Request #66 · plantnet/malpolon

Zathimo · 2024-10-11T09:43:30Z

📝 Changelog

new regression model
poverty data module
example script to run the regression model on poverty
easy access to images is yet to be implemented

✅ Checklist

Lint and tests pass locally with my changes
I've added necessary documentation

tlarcher · 2024-11-08T16:14:47Z

+            transform (torchvision.transforms): transform to apply to the data"""
+
+        super().__init__()
+        dataframe = pd.read_csv(dataset_path + labels_name)


Consider encapsulating your strings in pathlib.Path objects to avoid '/' (slash) sensitiveness.
e.g.:

from pathlib import Path dataframe = pd.read_csv(Path(dataset_path) / Path(labels_name))

As we discussed, I removed the dataset from the PR.
I did make the chage though, thanks for the advice!

tlarcher · 2024-11-08T16:15:12Z

+        self.dataframe_train = dataframe[dataframe['subset'] == 'train']
+        self.dataframe_val = dataframe[dataframe['subset'] == 'val']
+        self.dataframe_test = dataframe[dataframe['subset'] == 'test']
+        self.tif_dir = dataset_path + tif_dir


Consider encapsulating your strings in pathlib.Path objects to avoid '/' (slash) sensitiveness.
e.g.:

from pathlib import Path self.tif_dir = Path(dataset_path) / Path(tif_dir)

As we discussed, I removed the dataset from the PR.
I did make the chage though, thanks for the advice!

tlarcher · 2024-11-08T16:17:37Z

+        self.inference_batch_size = inference_batch_size
+        self.dict_normalize = json.load(open('examples/poverty/mean_std_normalize.json', 'r'))
+        self.num_workers = num_workers
+        self.task = 'regression'


The task argument is expected to be registered in an example config file. However here it is hardcoded.

Consider either adding the 'task' argument as input of your init() method; or checking in **kwargs if such and argument exists (taking priority over the default value 'regression')

As we discussed, I removed the dataset from the PR.
I did make the chage though, thanks for the advice!

tlarcher · 2024-11-08T16:55:31Z

+            dataset = MSDataset(self.dataframe_test, self.tif_dir, transform=self.test_transform())
+        return dataset
+
+    def train_dataloader(self):


As your code is, it is only useful to re-define train/val/test_dataloaders to use persistent_workers=True, which is relevant if your epochs are fast and you want to make your training faster by avoiding to re-create workers at each epoch. This is done at the cost of memory.

If this is the intended use, ignore this comment. In the contrary case, you can remove altogether the re-defined train/val/test_dataloaders

As we discussed, I removed the dataset from the PR.
I did make the chage though, thanks for the advice!

tlarcher · 2024-11-08T19:04:41Z


        loss = self.loss(y_hat, self._cast_type_to_loss(y))  # Shape mismatch for binary: need to 'y = y.unsqueeze(1)' (or use .reshape(2)) to cast from [2] to [2,1] and cast y to float with .float()
-        self.log(f"loss/{split}", loss, **log_kwargs)
+        self.log(f"loss_{split}", loss, **log_kwargs)


The / (slash) character was chosen as a separator between the quantity to monitor and the split, because Tensorboard recognizes it and gathers all quantity-related measurements in a single section. This is no the case with other characters.
Hence, / (slash) should be kept as a separator.

Changing that character would also mean updating every example

I didn't think of that. I undid it

tlarcher · 2024-11-08T19:05:02Z

            else:
                score = metric_func(y_hat, y)
-            self.log(f"{metric_name}/{split}", score, **log_kwargs)
+            self.log(f"{metric_name}_{split}", score, **log_kwargs)


Same as https://github.com/plantnet/malpolon/pull/66/files#r1834916272

tlarcher · 2024-11-08T19:15:26Z

        for metric_name, metric_func in self.metrics.items():
            if isinstance(metric_func, dict):
-                score = metric_func['callable'](y_hat, y, **metric_func['kwargs'])
+                if metric_func['kwargs']:


Indeed, metric kwargs are expected to be added in the config file; however, when the kwargs is empty, calling metric_func with kwargs will trigger a None-type error.

I suggest you replace that if condition by the following line instead:
score = metric_func['callable'](y_hat, y, metric_func.get('kwargs', {}))
replacing the old line 167.

This is a good addition, thx!

did the suggestion

tlarcher · 2024-11-11T17:44:03Z

        """
        device = 'cuda' if torch.cuda.is_available() else 'cpu'
        self.model.to(device)
+        data = data.to(device)


data isn't guaranteed to be a Tensor since this method doesn't require it to be (intentional behavior since the method should be callable by hand on any data type).
If data isn't a Tensor, the calling of .to() will fail and crash.
The data (or its elements) are cast to device anyway in the following lines.

=> Recommendation: delete this line

tlarcher · 2024-11-11T17:53:03Z

+                                      lr=lr,
+                                      weight_decay=weight_decay)
+
+        print(optimizer)


Remove debug print

tlarcher · 2024-11-11T18:03:27Z

+
+        print(optimizer)
+
+        if 'regression' in task:


No default value for loss. Risk of error if task doesn't contain 'regression' string as loss will be None. The error is caught by malpolon.models.utils.check_loss() but I advise handling a 'default loss value' case

removed the condition since the task is filtered

tlarcher · 2024-11-11T18:08:40Z

+
+        model = check_model(model)
+
+        optimizer = torch.optim.AdamW(model.parameters(),


INFO: optimizers instantiation have changed behavior since update v2.1.0 and is now a user choice (via config file.

This is still 100% compatible with the update, but consider adding a optimizer input argument and nesting your optimizer variable instantiation inside the condition if optimizer is None:

(see malpolon.models.standard_prediction_systems.ClassificationSystem)

tlarcher · 2024-11-11T18:13:27Z

+
        super().__init__(model, loss, optimizer, metrics=metrics)
+
+    def configure_optimizers(self):


INFO: since the v2.1.0 update, this method doesn't need to be re-defined unless you want to impose a fixed optimizer/scheduler as both of these objects can be defined by the user in his experiment config file

tlarcher · 2024-11-11T18:16:07Z

 from pathlib import Path
 from typing import Mapping, Union

+import torchmetrics


Import not used ?

tlarcher

Reviewed the engine side of your PR as requested. Couple of things to address. I'll let you remove the example/dataset part for now.
When everything is addressed I'll merge this in a new branch which will wait for you examples when they are ready.

If you want, you can adapt the (optimizer + config file) parts in accordance to the v2.1.0 update but that isn't necessary.

tlarcher

Reviewed the engine side of your PR as requested. Couple of things to address. I'll let you remove the example/dataset part for now.
When everything is addressed I'll merge this in a new branch which will wait for you examples when they are ready.

If you want, you can adapt the (optimizer + config file) parts in accordance to the v2.1.0 update but that isn't necessary.

tlarcher · 2024-11-12T00:47:38Z

+
+class RegressionSystem(GenericPredictionSystem):
+    """Regression task class."""
+    def __init__(


Some arguments (optimizer kwargs) are not used because incompatible with the proposed default optimizer. I suggest replacing them with the default optimizer's; or simply removing them.

Please update the docstring accordingly too

removed the kwargs

…ight issue

v2.1.1

Added Malpolon paper shield badge

Merge branch 'dev'

Zathimo and others added 30 commits July 25, 2024 10:53

add RegressionSystem

b6150da

add poverty example

c4bd970

add datasets and outputs to gitignore

701edbc

change description project poverty

06a5518

add datset landsat

5dc5159

add datset_poverty modif

092d882

add datset , .tiff

7cf0c8e

display poverty images / bands

f446276

modif MSDataset

19d22c7

add label.csv

4af7c49

remove dataset from ignore

5e08b87

write LightningDataModule

4574e1a

add main test

9eeca80

add observation

1dfcc81

rename

71bf40d

update description

38f8725

modifications test_dataset

2572ea9

add *.tif, landsat_tiles

87fe8f1

avancement

fed4d58

test

51a5511

changes

b2f94d5

uptade impot __init__

f583e0d

add config

8925b22

works ?

7572e64

changes

29d7517

add torchmetrics import

87f02ff

add print debugg

8a01f14

add sys.path.append to work with malpolon root package

6b508d5

changes

e0b71ec

works ??

594117c

tlarcher reviewed Nov 8, 2024

View reviewed changes

tlarcher reviewed Nov 11, 2024

View reviewed changes

tlarcher requested changes Nov 11, 2024

View reviewed changes

tlarcher reviewed Nov 12, 2024

View reviewed changes

tlarcher mentioned this pull request Nov 12, 2024

Created regression_system and updated check_loss #71

Open

Zathimo and others added 13 commits November 12, 2024 16:55

change request

7fe8257

remove poverty example

2f47792

Merge branch 'plantnet:main' into PR

a3f2b1f

Updated glc24_pre_extracted pretrained weights link to address pos_we…

0b087be

…ight issue

Updated setup.py for v2.1.1

7662e16

Merge pull request plantnet#72 from plantnet/dev

4687bf1

v2.1.1

Update README.md

8a58c3a

Added Malpolon paper shield badge

Merge branch 'dev'

1f41877

Fixed broken links in root Reamde

0d6294a

git push

194a18e

Merge branch 'dev'

delete data.to(device)

a8fd4ed

Merge remote-tracking branch 'origin/PR' into PR

abc1004

Merge branch 'plantnet:main' into PR

a377812


		model = check_model(model)

		optimizer = torch.optim.AdamW(model.parameters(),


		super().__init__(model, loss, optimizer, metrics=metrics)

		def configure_optimizers(self):

Conversation

Zathimo commented Oct 11, 2024

📝 Changelog

✅ Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlarcher Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlarcher left a comment

Choose a reason for hiding this comment

Uh oh!

tlarcher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tlarcher Nov 11, 2024 •

edited

Loading