WIP: AdaptiveLasso and AdaptiveLassoCV #169

mathurinm · 2020-11-11T11:25:14Z

closes #165

codecov-io · 2020-11-11T12:43:26Z

Codecov Report

Merging #169 (93963af) into master (a8a110f) will decrease coverage by 1.61%.
The diff coverage is 74.03%.

@@            Coverage Diff             @@
##           master     #169      +/-   ##
==========================================
- Coverage   86.08%   84.47%   -1.62%     
==========================================
  Files          13       14       +1     
  Lines         913      979      +66     
  Branches      120      122       +2     
==========================================
+ Hits          786      827      +41     
- Misses         97      122      +25     
  Partials       30       30

Flag	Coverage Δ
unittests	`100.00% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
celer/dropin_sklearn.py	`91.92% <65.21%> (-4.48%)`	⬇️
celer/tests/test_adaptive.py	`69.69% <69.69%> (ø)`
celer/homotopy.py	`84.46% <80.85%> (-2.72%)`	⬇️
celer/__init__.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7644a26...93963af. Read the comment docs.

mathurinm · 2020-11-11T13:51:37Z

@josephsalmon @agramfort @QB3
Here's the example : https://183-122246365-gh.circle-artifacts.com/0/dev/auto_examples/plot_adaptivelasso.html#sphx-glr-auto-examples-plot-adaptivelasso-py

Agree with the name for new parameter: n_reweightings ?
Docstrings may need to be polished, feedback welcome. As you can see there's a lot of duplicated code/docstring with the Lasso class...

QB3 · 2020-11-11T14:05:52Z

@josephsalmon @agramfort @QB3
Here's the example : https://183-122246365-gh.circle-artifacts.com/0/dev/auto_examples/plot_adaptivelasso.html#sphx-glr-auto-examples-plot-adaptivelasso-py

Agree with the name for new parameter: n_reweightings ?
Docstrings may need to be polished, feedback welcome. As you can see there's a lot of duplicated code/docstring with the Lasso class...

Maybe this is a little bit off topic but I put it here, just in case:

I am impressed that the adaptive Lasso does better in prediction for the CV
I am also impressed by the piecewise differentiability of the path for the adaptive.
From what I remember, when I played with the adaptive a while ago, the mse_path was highly discontinuous, and we thus we could not not use autodiff to set the regularization parameter. Your plot seems to suggest the opposite, this may worth investigate.

mathurinm · 2020-11-11T14:31:57Z

I don't think it does better in prediction, does it ?

If you put n_samples, n_features = 100, 160 the MSE curves go havoc for Adaptive:

josephsalmon · 2020-11-11T21:38:36Z

This is interesting, and weird!
Do you confirm this is bug less on the AdaptiveLassoCV side?
Could it be due to lack of convergence?
The blue dotted line on the right is wild...is the black line the average of the 5 folds (seems median would be less damaged here)?

mathurinm · 2020-11-12T08:33:38Z

celer/lasso_fast.pyx

@@ -100,7 +100,10 @@ def celer(
    cdef int[:] all_features = np.arange(n_features, dtype=np.int32)

    for t in range(max_iter):
-        if t != 0:
+        # if t != 0:


This introduces a severe regression a the beginning of paths for large X (e.g. finance) where X.T @ theta is recomputed while there are 3 features in the WS.

Maintaining Xtheta from one alpha to the other may be a fix.

mathurinm · 2020-11-12T09:27:54Z

I made another experiment on Finance:

The part where it gets funny: AdaptiveLasso uses only one coefficient:

In [26]: (lasso.coef_ != 0).sum()
Out[26]: 267

In [27]: (adaptive_lasso.coef_ != 0).sum()
Out[27]: 1

In [28]: np.where(adaptive_lasso.coef_)
Out[28]: (array([0]),)

In [29]: X0 = X[:, 0].toarray().squeeze()

In [30]: np.dot(X0, y) / np.linalg.norm(y) / np.linalg.norm(X0)
Out[30]: 0.9941543404858482

In [31]: adaptive_lasso.intercept_, lasso.intercept_
Out[31]: (-0.6253583759665271, -1.125413246436354)

In [32]: X0_cent = X0 - X0.mean()

In [33]: y_cent = y - y.mean()

In [34]: np.dot(X0_cent, y_cent) / np.linalg.norm(y_cent) / np.linalg.norm(X0_cent)
Out[34]: 0.8085836249206475

In [35]: lasso.normalize
Out[35]: False

In [36]: sparse.linalg.norm(X, axis=0)
Out[36]: 
array([436.72549038, 214.92614813, 776.73555157, ...,   1.20056613,
         1.47236374,   1.89979315])

On a different note, I think the way the adaptive path is computed is suboptimal: instead on doing

for alpha in alphas:
    for iter in range(n_reweightings):

we should do

for iter in range(n_reweightings):
   for alpha in alphas:

in order to benefit from warm start. Otherwise, for a fresh new alpha (without weights), initiliazing with the last reweighting solution for the previous alpha is not very useful. Makes sense @josephsalmon

josephsalmon · 2020-11-12T09:45:35Z

sparsity =1 for AdaptiveLassoCV? really unexpected (normalization issue?).

I am also supprised that for alpha -> 0 the two methods reach different errors level (should be the OLS performance, but I agree in high dim, there might be different least-squares solutions).

And I agree with the loop inversion: very good idea!

mathurinm · 2020-11-12T09:54:28Z

I am not sure why the two paths should tend to same solutions as alphas goes to 0 : ot should be the basis pursuis performance for the Lasso, but for the adaptive lasso I don't know what the limit is expected to be.

josephsalmon · 2020-11-12T10:23:30Z

You are right, I just thought they would be closer...
My guess would be the Adaptive Lasso with alpha->0 should converge to the solution of the interpolation Xcoef=y with constraints \norm{coef}_{0.5} minimized.

mathurinm · 2020-11-13T09:35:50Z

celer/dropin_sklearn.py

@@ -276,13 +279,269 @@ def _more_tags(self):
        return {'multioutput': False}


+class AdaptiveLasso(Lasso):


I am now thinking this is a bad name, as adaptive Lasso takes weights equal to 1 / w^ols_j, and it performs a single Lasso.
What we implement is rather iterative l1 reweighting (Candes Wakin Boyd paper).

IterativeReweightedLasso seems more correct for me, but it does not pop up on google, and I don't know if it good for visibility. When people talk about it in scikit-learn, they say adaptive lasso : scikit-learn/scikit-learn#4912
@agramfort @josephsalmon do you have an opinion ?

The main naming issue to me is that an estimator should be well-defined mathematically; then the implementation is something under the hood for the user.

Here, the algorithm you are proposing is a DC-programming (à la Gasso et al. 2009) approach for solving sparse regression with \ell_0.5 regularization (also referred to as reweighted l1 by Cands et al.).
Hence, I would be in favor of separating the "theoretical" estimator from the algorithms used (for instance a coordinate descent alternative could be considered as another solver for sparse regression with \ell_0.5 regularization).

I agree that AdaptiveLasso is originally 2-step:

OLS

Lasso reweighted with weights controlled by step 1.

But I think this is vague enough in the original article (any consistent estimator can be used in the first step, not only OLS), so we can reuse this naming.

Potentially, the exponent \gamma (corresponding to the \ell_q norm used in the Adaptive Lasso paper) could be an optional parameter something like:

lq_norm = 0.5

(with the possibility to add more variants later on).

So in the end, I won't bother too much about the naming and stick to AdaptiveLasso as a good shortcut.

mathurinm · 2020-12-07T17:25:59Z

It's way faster this way but some gap scaling is wrong somewhere, we get negative ones in doc examples and on news20:

import time
from celer import AdaptiveLassoCV
from libsvmdata import fetch_libsvm
X, y = fetch_libsvm("news20")
t0 = time.time(); clf = AdaptiveLassoCV(verbose=1, cv=2, n_jobs=2, eps=1e-2).fit(X, y); dur = time.time() - t0

mathurinm mentioned this pull request Nov 11, 2020

ENH add back example adaptive lasso #167

Closed

mathurinm commented Nov 12, 2020

View reviewed changes

mathurinm commented Nov 13, 2020

View reviewed changes

idc9 mentioned this pull request Jan 18, 2021

Feature request: the LLA algorithm for general folded concave penalties #185

Open

mathurinm added 10 commits January 20, 2021 11:49

Add AdaptiveLasso and AdaptiveLassoCV classes

1971397

doc and docstrings

86287c5

Add a test, disable warm starting theta

82e1fbe

forgot to commit test

6833e7a

Use reweighting in LassoCV, add example

3684efa

example

fbc527b

fix self.model

95cc5f0

Improve example

a92b530

Better description + Readme

98b0766

invert loop over alphas and reweightings

ddd12bb

mathurinm force-pushed the adaptivelassocv branch from 58568db to ddd12bb Compare January 20, 2021 10:52

mathurinm added 5 commits January 20, 2021 12:01

messed up rebase

f96e3e3

Ignore infinity weights in primal, recompute theta always

ad93c8d

other messup in rebase

5ec3776

make docstring test pass

0a8dcf6

Flake

3922fdc

mathurinm added 14 commits January 21, 2021 13:16

avoid division by zero warning

7332a54

a broken test that is not fixed by disabling screening

587bec8

even simpler failing case

e683519

Fix: mismatch between infinite weights and non zero w[j]

1fe0aff

script to play with AdaptiveLassoCv path

2ec58ff

no screening for non zero feature

f62a7d0

better example adaptive

0c5eb40

fix new example

d764302

improve example

498b887

Merge branch 'master' of github.com:mathurinm/celer into adaptivelassocv

49c99da

flake8

bc5bd52

Merge branch 'master' of github.com:mathurinm/celer into adaptivelassocv

28b18db

flexible reweighting function

98d834f

rm print, fix missing abs in sqrt weights

93963af

mathurinm mentioned this pull request Jun 22, 2021

ENH : add reweightedL1 solver with CV mathurinm/andersoncd#34

Open

mathurinm closed this Oct 5, 2021

mathurinm deleted the adaptivelassocv branch June 11, 2022 10:20

georged4s mentioned this pull request Oct 11, 2022

Implement Adaptive Lasso / reweighted L1 scikit-learn/scikit-learn#555

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: AdaptiveLasso and AdaptiveLassoCV #169

WIP: AdaptiveLasso and AdaptiveLassoCV #169

mathurinm commented Nov 11, 2020 •

edited

Loading

codecov-io commented Nov 11, 2020 •

edited

Loading

mathurinm commented Nov 11, 2020

QB3 commented Nov 11, 2020

mathurinm commented Nov 11, 2020

josephsalmon commented Nov 11, 2020

mathurinm Nov 12, 2020

mathurinm commented Nov 12, 2020

josephsalmon commented Nov 12, 2020

mathurinm commented Nov 12, 2020

josephsalmon commented Nov 12, 2020

mathurinm Nov 13, 2020

josephsalmon Nov 13, 2020

mathurinm commented Dec 7, 2020

		@@ -276,13 +279,269 @@ def _more_tags(self):
		return {'multioutput': False}


		class AdaptiveLasso(Lasso):

WIP: AdaptiveLasso and AdaptiveLassoCV #169

WIP: AdaptiveLasso and AdaptiveLassoCV #169

Conversation

mathurinm commented Nov 11, 2020 • edited Loading

codecov-io commented Nov 11, 2020 • edited Loading

Codecov Report

mathurinm commented Nov 11, 2020

QB3 commented Nov 11, 2020

mathurinm commented Nov 11, 2020

josephsalmon commented Nov 11, 2020

mathurinm Nov 12, 2020

Choose a reason for hiding this comment

mathurinm commented Nov 12, 2020

josephsalmon commented Nov 12, 2020

mathurinm commented Nov 12, 2020

josephsalmon commented Nov 12, 2020

mathurinm Nov 13, 2020

Choose a reason for hiding this comment

josephsalmon Nov 13, 2020

Choose a reason for hiding this comment

mathurinm commented Dec 7, 2020

mathurinm commented Nov 11, 2020 •

edited

Loading

codecov-io commented Nov 11, 2020 •

edited

Loading