Releases · ngxson/wllama

07 Oct 09:49

ngxson

1.16.3

ffcd98a

1.16.3 Latest

Latest

What's Changed

sync to latest upstream source code by @ngxson in #125

Thanks to a small refactoring on llama.cpp, be binary size is now reduced from 1.78MB to 1.52MB

Full Changelog: 1.16.2...1.16.3

Contributors

ngxson

Assets 2

23 Sep 16:15

ngxson

1.16.2

d9b849e

1.16.2

What's Changed

decode/encode : do not fail on empty batch by @ngxson in #118
Update to latest llama.cpp source code by @ngxson in #119

Full Changelog: 1.16.1...1.16.2

Contributors

ngxson

Assets 2

06 Sep 14:29

ngxson

1.16.1

7beefeb

1.16.1

What's Changed

v1.16.1 by @ngxson in #113

Full Changelog: 1.16.0...1.16.1

Contributors

ngxson

Assets 2

19 Aug 10:04

ngxson

1.16.0

e7fe626

1.16.0

SmolLM-360m is added as a model in main example. Try it now --> https://huggingface.co/spaces/ngxson/wllama

Special thanks to @huggingface team for providing a such powerful model in a very small size!

What's Changed

ability to use custom cacheManager by @ngxson in #109

Full Changelog: 1.15.0...1.16.0

Contributors

ngxson and huggingface

Assets 2

03 Aug 20:34

ngxson

1.15.0

667dd91

1.15.0

New features

downloadModel()

Download model to cache without loading it. The use case would be to allow application to have a "model manager" screen that allows:

Download model via downloadModel()
List all downloaded models using CacheManager.list()
Delete a downloaded model using CacheManager.delete()

KV cache reuse in createCompletion

When calling createCompletion, you can pass useCache: true as an option. It will reuse the KV cache from the last createCompletion call. It is equivalent to cache_prompt option on llama.cpp server.

wllama.createCompletion(input, {
  useCache: true,
  ...
});

For example:

On the first call, you have 2 messages: user: hello, assistant: hi
On the second call, you add one message: user: hello, assistant: hi, user: who are you?

Then, only the added message user: who are you? will need to be evaluated.

What's Changed

Add downloadModel function by @ngxson in #95
fix log print and downloadModel by @ngxson in #100
Add main example (chat UI) by @ngxson in #99
Improve main UI example by @ngxson in #102
implement KV cache reuse by @ngxson in #103

Full Changelog: 1.14.2...1.15.0

Contributors

ngxson

Assets 2

28 Jul 11:39

ngxson

1.14.2

d15748b

1.14.2

Update to latest upstream llama.cpp source code:

Fix support for llama-3.1, phi 3 and SmolLM

Full Changelog: 1.14.0...1.14.2

Assets 2

10 Jul 11:51

ngxson

1.14.0

94ebb81

1.14.0

What's Changed

save ETag metadata, add allowOffline option in #90
Added experimental support for encoder-decoder architecture #91

Full Changelog: 1.13.0...1.14.0

Assets 2

03 Jul 15:13

ngxson

1.13.0

44a4de5

1.13.0

What's Changed

Update README.md by @flatsiedatsie in #78
sync with upstream llama.cpp source code (+gemma2 support) by @ngxson in #81
Fix exit() function crash if model is not loaded by @flatsiedatsie in #84
Improve cache API by @ngxson in #80
v1.13.0 by @ngxson in #85

New Contributors

@flatsiedatsie made their first contribution in #78

Full Changelog: 1.12.1...1.13.0

Contributors

flatsiedatsie and ngxson

Assets 2

27 Jun 20:49

ngxson

1.12.1

b847495

1.12.1

What's Changed

Sync with latest upstream source code + adapt to project structure change by @ngxson in #77

Full Changelog: 1.12.0...1.12.1

Contributors

ngxson

Assets 2

24 Jun 15:29

ngxson

1.12.0

896c160

1.12.0

Important

In prior versions, if you initialize wllama with embeddings: true, you will still able to generate completions.

From v1.12.0, if you start wllama with embeddings: true, this will throws an error when you try to use createCompletion. You must add wllama.setOptions({ embeddings: false }) to turn of embeddings.

More details: This feature is introduced in ggerganov/llama.cpp#7477 , which allows models like GritLM to be used for both embeddings and text generation.

What's Changed

Add wllama.setOptions by @ngxson in #73
v1.12.0 by @ngxson in #74
warn user if embeddings is incorrectly set by @ngxson in #75

Full Changelog: 1.11.0...1.12.0

Contributors

ngxson

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

New features

downloadModel()

KV cache reuse in createCompletion

What's Changed

Contributors

What's Changed

What's Changed

New Contributors

Contributors

What's Changed

Contributors

Important

What's Changed

Contributors

Releases: ngxson/wllama

1.16.3

What's Changed

Contributors

1.16.2

What's Changed

Contributors

1.16.1

What's Changed

Contributors

1.16.0

What's Changed

Contributors

1.15.0

New features

downloadModel()

KV cache reuse in createCompletion

What's Changed

Contributors

1.14.2

1.14.0

What's Changed

1.13.0

What's Changed

New Contributors

Contributors

1.12.1

What's Changed

Contributors

1.12.0

Important

What's Changed

Contributors