Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the Phi 3 vision model #351

Merged
merged 184 commits into from
Jun 7, 2024
Merged
Changes from 1 commit
Commits
Show all changes
184 commits
Select commit Hold shift + click to select a range
bf003ed
Begin works on idefics
EricLBuehler May 14, 2024
410de48
Begin works on idefics
EricLBuehler May 14, 2024
95d6394
Implement the vision transformer part
EricLBuehler May 15, 2024
4543f40
Merge branch 'master' into idefics2
EricLBuehler May 15, 2024
c7f8791
Add the connector model
EricLBuehler May 15, 2024
83575dc
Add config
EricLBuehler May 15, 2024
ad69fc5
Merge branch 'master' into idefics2
EricLBuehler May 15, 2024
7ff2b01
Merge
EricLBuehler May 15, 2024
69e4859
Merge branch 'master' into idefics2
EricLBuehler May 15, 2024
fab36bc
Merge branch 'master' into idefics2
EricLBuehler May 15, 2024
0ce1152
Merge branch 'master' into idefics2
EricLBuehler May 16, 2024
345982c
Merge
EricLBuehler May 16, 2024
b1b7bf8
More progress
EricLBuehler May 16, 2024
5797660
Merge branch 'master' into idefics2
EricLBuehler May 18, 2024
31e9e9e
Implement the bucketize functions
EricLBuehler May 20, 2024
6d5af54
Complete the bucketize, unfold functions and finish idefic2 global fo…
EricLBuehler May 21, 2024
e543ba2
Merge branch 'master' into idefics2
EricLBuehler May 21, 2024
477e319
Merge branch 'master' into idefics2
EricLBuehler May 21, 2024
8eb9251
Mask
EricLBuehler May 21, 2024
3e77b03
Clippy
EricLBuehler May 21, 2024
0580be3
Add framework for image pre processors
EricLBuehler May 21, 2024
32f470a
Implement utility functions for image preprocessor
EricLBuehler May 21, 2024
c4bc747
Implement some functions for image processor
EricLBuehler May 21, 2024
7582a81
Merge branch 'master' into idefics2
EricLBuehler May 22, 2024
9e76a4f
Clippy
EricLBuehler May 22, 2024
33f3d0a
Merge branch 'master' into idefics2
EricLBuehler May 22, 2024
e7b5fd6
Calculate pixel values
EricLBuehler May 22, 2024
34994bc
Pass and integrate pixel attention mask
EricLBuehler May 22, 2024
5932aea
Add vision pipeline and major refactor
EricLBuehler May 23, 2024
e8efbe8
Add model category state
EricLBuehler May 23, 2024
5eb7f7a
Remove some todos
EricLBuehler May 23, 2024
9585476
Merge branch 'master' into idefics2
EricLBuehler May 23, 2024
18beaca
Merge branch 'master' into idefics2
EricLBuehler May 23, 2024
4fe2fe6
Get rid of some todos
EricLBuehler May 23, 2024
c84e8fb
Refactor slightly
EricLBuehler May 23, 2024
60c6d41
Prepare inputs for vision model
EricLBuehler May 23, 2024
ab8c6de
Clippy
EricLBuehler May 23, 2024
eca3f16
Add better defaults for image processor
EricLBuehler May 23, 2024
fe24e08
Implement scheduling based on image dims
EricLBuehler May 23, 2024
b0d16f7
Implement scheduling based on image dims
EricLBuehler May 23, 2024
818b740
Better scheduling with pad_to
EricLBuehler May 23, 2024
b6c2747
Properly get images from request
EricLBuehler May 23, 2024
e3167a5
Implement for the http interface
EricLBuehler May 23, 2024
83ee92b
Fix
EricLBuehler May 23, 2024
cfe1b33
Implement preprocessor usage and load processor config
EricLBuehler May 24, 2024
30b5407
Allow handling of content messages
EricLBuehler May 24, 2024
f37a378
Add processor infrastructure
EricLBuehler May 24, 2024
0feb9b0
Load processor based on vision model kind
EricLBuehler May 24, 2024
4185ca2
Add a new test for vision chat templates
EricLBuehler May 24, 2024
ccb0f3c
Clippy
EricLBuehler May 24, 2024
a401b44
Add the vision plain model
EricLBuehler May 24, 2024
7c39e8e
A batch of fixes
EricLBuehler May 24, 2024
9ba563b
Remove arc get_mut usage for adding special tokens
EricLBuehler May 24, 2024
cc8c122
Add idefics2 example
EricLBuehler May 24, 2024
043e071
Fixes with http
EricLBuehler May 24, 2024
4a2bdde
Fixes
EricLBuehler May 24, 2024
a27797d
Calculate padding shapes properly
EricLBuehler May 25, 2024
747f7cd
Fix
EricLBuehler May 25, 2024
8989957
Fixes
EricLBuehler May 25, 2024
469da8c
Fix index select
EricLBuehler May 25, 2024
b10bfd4
Track
EricLBuehler May 25, 2024
2e09066
Merge branch 'master' into idefics2
EricLBuehler May 26, 2024
c09dd81
Fix vision attention mask
EricLBuehler May 26, 2024
b1eac47
Merge branch 'master' into idefics2
EricLBuehler May 26, 2024
fd8412e
Intial work on phi3v
EricLBuehler May 27, 2024
31e62d2
Add the image embedding layer
EricLBuehler May 27, 2024
c519e81
Lints
EricLBuehler May 27, 2024
de1a8d5
Implement the loader
EricLBuehler May 27, 2024
00d2fb5
Add infrastructure for phi3 image processor
EricLBuehler May 28, 2024
e0d9a5b
Merge branch 'master' into phi3_vision
EricLBuehler May 28, 2024
e83bcf1
Merge
EricLBuehler May 28, 2024
71aec32
Merge branch 'master' into phi3_vision
EricLBuehler May 28, 2024
0921b90
Merge
EricLBuehler May 28, 2024
737a9fc
Merge branch 'master' into phi3_vision
EricLBuehler May 28, 2024
7c4c1c0
Merge
EricLBuehler May 28, 2024
b2036bf
Merge
EricLBuehler May 29, 2024
a56c09a
Merge
EricLBuehler May 29, 2024
87bb4ae
Partially implement padding
EricLBuehler May 29, 2024
17589a8
Implement the hd transform step
EricLBuehler May 29, 2024
2b742f6
Merge branch 'master' into phi3_vision
EricLBuehler May 29, 2024
50f7830
Work on the image processor
EricLBuehler May 29, 2024
0960640
Clippy
EricLBuehler May 29, 2024
f550bee
Complete the phi3v inputs processor
EricLBuehler May 31, 2024
37a5b8f
Rename
EricLBuehler May 31, 2024
68e57b6
Merge branch 'master' into phi3_vision
EricLBuehler May 31, 2024
a8d30bd
Merge branch 'master' into phi3_vision
EricLBuehler May 31, 2024
124720a
Merge
EricLBuehler May 31, 2024
0ec9aec
Merge branch 'master' into phi3_vision
EricLBuehler May 31, 2024
5a27c94
Merge branch 'master' into phi3_vision
EricLBuehler May 31, 2024
971ffa9
Merge
EricLBuehler May 31, 2024
24092e0
Rename to phi3v and fix deser
EricLBuehler May 31, 2024
88a2df8
Fix varbuilder
EricLBuehler May 31, 2024
b126d28
Fix varbuilder
EricLBuehler May 31, 2024
6e1a6a8
Default for do convert rgb
EricLBuehler May 31, 2024
989eb32
Some defaults
EricLBuehler May 31, 2024
f42b527
Allow no processor config
EricLBuehler May 31, 2024
7a1b8ce
Setup debug flag
EricLBuehler May 31, 2024
c1d6b48
Add phi3v
EricLBuehler May 31, 2024
1135b46
Implement messages flattening
EricLBuehler May 31, 2024
402fa16
Update
EricLBuehler May 31, 2024
5634a3a
Rewrite the pad, hd transform
EricLBuehler Jun 1, 2024
95f7952
Clippy
EricLBuehler Jun 1, 2024
6889b6e
Detect num channels
EricLBuehler Jun 1, 2024
fc545d8
Fix reshape
EricLBuehler Jun 1, 2024
b153315
Fix global image channel dim
EricLBuehler Jun 1, 2024
34b4020
Fix assert
EricLBuehler Jun 1, 2024
46e4de4
Fix dtype
EricLBuehler Jun 1, 2024
c740f20
Fix gt
EricLBuehler Jun 1, 2024
a49ee63
Fix image id neg
EricLBuehler Jun 1, 2024
6da7303
Fix dim0 of pixel values
EricLBuehler Jun 1, 2024
c39bc50
Fix dtype
EricLBuehler Jun 1, 2024
5e66df0
Check if model supports gemm
EricLBuehler Jun 1, 2024
ff0cf0f
Fix some shape errors
EricLBuehler Jun 1, 2024
d2f955d
Fix some shape errors
EricLBuehler Jun 1, 2024
2dfa979
Fix rank of slice_assign
EricLBuehler Jun 1, 2024
5561db2
Fix image toks
EricLBuehler Jun 1, 2024
1aae49e
Properly downcase
EricLBuehler Jun 1, 2024
6ce1d28
Fix response
EricLBuehler Jun 1, 2024
98269bc
Fix response
EricLBuehler Jun 1, 2024
da3a7da
Allow no images in prompt
EricLBuehler Jun 1, 2024
b8b31c6
Output correct hidden state
EricLBuehler Jun 1, 2024
ca6390b
Fix nonzero and add test
EricLBuehler Jun 1, 2024
a37b32d
Fix n image toks
EricLBuehler Jun 2, 2024
54093a0
Merge branch 'master' into phi3_vision
EricLBuehler Jun 2, 2024
5392dbe
Add mistralrs_vision
EricLBuehler Jun 2, 2024
2872365
Typo
EricLBuehler Jun 2, 2024
4f26273
Fix and add tests
EricLBuehler Jun 2, 2024
25e0a9e
Fix indexing
EricLBuehler Jun 2, 2024
52ec34b
Fix test condition
EricLBuehler Jun 2, 2024
399c178
Fix unsqueeze
EricLBuehler Jun 2, 2024
20aa9a1
Fix dtype for norm
EricLBuehler Jun 2, 2024
40a38ab
Merge
EricLBuehler Jun 3, 2024
9ed7b6f
Update clip
EricLBuehler Jun 3, 2024
cf3693b
Clippy
EricLBuehler Jun 3, 2024
76323ec
Run clip in f32
EricLBuehler Jun 3, 2024
3d3cacd
Run in bf16
EricLBuehler Jun 3, 2024
e6ad82b
Merge branch 'master' into phi3_vision
EricLBuehler Jun 3, 2024
d550662
Run in bf16 again
EricLBuehler Jun 3, 2024
75fc861
Fix dtype
EricLBuehler Jun 3, 2024
3550300
Set toks to have correct context lens
EricLBuehler Jun 4, 2024
7eebd0d
Set toks to have correct context lens
EricLBuehler Jun 4, 2024
401d603
Merge branch 'master' into phi3_vision
EricLBuehler Jun 4, 2024
a8c2b41
Support multiple GGUF files (#379)
EricLBuehler Jun 5, 2024
9b46c1c
Merge branch 'master' into phi3_vision
EricLBuehler Jun 5, 2024
bec9a4b
Merge
EricLBuehler Jun 5, 2024
19ca7ac
Organize normal loading metadata (#381)
EricLBuehler Jun 5, 2024
818808b
Bump version 0.1.13 -> 0.1.14 (#382)
EricLBuehler Jun 5, 2024
9712da6
Patch incorrect unwrap and bump version (#383)
EricLBuehler Jun 5, 2024
798adb4
More verbose logging during loading (#385)
EricLBuehler Jun 5, 2024
89dea1b
Refactor enabling debug logging (#387)
EricLBuehler Jun 5, 2024
5c5476d
Merge branch 'master' into phi3_vision
EricLBuehler Jun 5, 2024
5029063
Merge
EricLBuehler Jun 5, 2024
9b7543b
Merge
EricLBuehler Jun 5, 2024
c6ed513
Refactor enabling debug logging (#387)
EricLBuehler Jun 5, 2024
998aa96
Merge
EricLBuehler Jun 5, 2024
2330f56
Use precise gelu
EricLBuehler Jun 5, 2024
65e1a79
Use correct kernel
EricLBuehler Jun 5, 2024
17a87bc
Debugging commit
EricLBuehler Jun 5, 2024
8d4888b
Merge branch 'master' into phi3_vision
EricLBuehler Jun 6, 2024
40963c9
Add fused bias linear
EricLBuehler Jun 7, 2024
1f2bf87
Merge branch 'master' into phi3_vision
EricLBuehler Jun 7, 2024
428b36f
Finish merge
EricLBuehler Jun 7, 2024
1a89341
Use fused layer in clip
EricLBuehler Jun 7, 2024
3ccd3e6
Save progress
EricLBuehler Jun 7, 2024
1327893
Remove debugs
EricLBuehler Jun 7, 2024
2b8cb17
Update example
EricLBuehler Jun 7, 2024
e7dff6c
Resize exact
EricLBuehler Jun 7, 2024
3b6cbbc
Update interpolate
EricLBuehler Jun 7, 2024
298e56e
Fix batch dim
EricLBuehler Jun 7, 2024
14e3f2f
Update test and transform
EricLBuehler Jun 7, 2024
ced3cab
It works
EricLBuehler Jun 7, 2024
cbccb41
Add some examples
EricLBuehler Jun 7, 2024
6827df2
Merge branch 'master' into phi3_vision
EricLBuehler Jun 7, 2024
21443aa
Allow more than one image
EricLBuehler Jun 7, 2024
1aba518
Add support in python api
EricLBuehler Jun 7, 2024
cdd71ce
Add to toml selector
EricLBuehler Jun 7, 2024
ee69dfd
Update python api
EricLBuehler Jun 7, 2024
d7a7c3c
Overhaul readme and docs
EricLBuehler Jun 7, 2024
77885e6
Update
EricLBuehler Jun 7, 2024
af5b83d
Export vision arch
EricLBuehler Jun 7, 2024
34a21c4
Export vision arch
EricLBuehler Jun 7, 2024
f70370c
Export vision arch
EricLBuehler Jun 7, 2024
d65e884
Fix max img dim
EricLBuehler Jun 7, 2024
600ef37
Fix unwrap
EricLBuehler Jun 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update interpolate
EricLBuehler committed Jun 7, 2024

Unverified

This user has not yet uploaded their public signing key.
commit 3b6cbbc754e7c9d3c6d2a8e1d3f439b68d2488f6
22 changes: 12 additions & 10 deletions mistralrs-core/src/vision_models/phi3_inputs_processor.rs
Original file line number Diff line number Diff line change
@@ -5,7 +5,7 @@ use std::{any::Any, sync::Arc};
use candle_core::{Device, Result, Tensor};
use image::{imageops::FilterType, DynamicImage, GenericImage, GenericImageView, Rgba};
use itertools::Itertools;
use mistralrs_vision::{ApplyTransforms, Normalize, ToTensor, ToTensorAndResize, Transforms};
use mistralrs_vision::{ApplyTransforms, InterpolateResize, Normalize, ToTensor, Transforms};
use regex_automata::meta::Regex;
use tokenizers::Tokenizer;

@@ -390,15 +390,17 @@ impl ImagePreProcessor for Phi3InputsProcessor {
};
// Transforms for the global image (after HD transform, resized)
let transforms_global = Transforms {
input: &ToTensorAndResize {
target_h: 336,
target_w: 336,
filter: FilterType::CatmullRom,
},
inner_transforms: &[&Normalize {
mean: config.image_mean.unwrap_or(Self::DEFAULT_MEAN).to_vec(),
std: config.image_std.unwrap_or(Self::DEFAULT_STD).to_vec(),
}],
input: &ToTensor,
inner_transforms: &[
&Normalize {
mean: config.image_mean.unwrap_or(Self::DEFAULT_MEAN).to_vec(),
std: config.image_std.unwrap_or(Self::DEFAULT_STD).to_vec(),
},
&InterpolateResize {
target_h: 336,
target_w: 336,
},
],
};

// Resize with bicubic interpolation
2 changes: 1 addition & 1 deletion mistralrs-vision/src/lib.rs
Original file line number Diff line number Diff line change
@@ -2,7 +2,7 @@ use candle_core::{Device, Result, Tensor};
use image::DynamicImage;
mod transforms;
pub(crate) mod utils;
pub use transforms::{Normalize, ToTensor, ToTensorAndResize};
pub use transforms::{InterpolateResize, Normalize, ToTensor};

pub trait ImageTransform {
type Input;
21 changes: 7 additions & 14 deletions mistralrs-vision/src/transforms.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
use crate::utils::{get_pixel_data, n_channels};
use candle_core::{DType, Device, Result, Tensor};
use image::{
imageops::{self, FilterType},
DynamicImage, GenericImageView,
};
use image::{DynamicImage, GenericImageView};

use crate::ImageTransform;

@@ -77,21 +74,17 @@ impl ImageTransform for Normalize {

/// Do what `ToTensor` does, but also resize the image without preserving
/// aspect ratio.
pub struct ToTensorAndResize {
pub struct InterpolateResize {
pub target_w: usize,
pub target_h: usize,
pub filter: FilterType,
}

impl ImageTransform for ToTensorAndResize {
type Input = DynamicImage;
type Output = Tensor;
impl ImageTransform for InterpolateResize {
type Input = Tensor;
type Output = Self::Input;

fn map(&self, x: &Self::Input, device: &Device) -> Result<Self::Output> {
let n_channels = n_channels(x);
let img = imageops::resize(x, self.target_w as u32, self.target_h as u32, self.filter);
let data = get_pixel_data(n_channels, img, self.target_h, self.target_w);
ToTensor::to_tensor(device, n_channels, data)
fn map(&self, x: &Self::Input, _: &Device) -> Result<Self::Output> {
x.interpolate2d(self.target_h, self.target_w)
}
}