Close

LLaVa example

A project log for Potpourri

Definition: A mixture of things, especially a musical or literary medley

peter-walshPeter Walsh 12/29/2023 at 18:450 Comments

An example of LLaVa, downloaded and run from Donald Papp's recent article.


[NB: This is not a comment on Justine Tunney's work, which is both excellent and valuable. I'm only pointing out that current AI systems leave a lot to be desired.]


I used an example image (below) that was not on the net, and asked the LLM to describe the image.

The entire process took almost 24 minutes, and the description is *almost* accurate.

#
# LLaVa Command
#
/tmp/Pics: time /Data/llama/llava-v1.5-7b-q4-main.llamafile     --image DSCF2753.jpg --temp 0 -ngl 35     -e -p '### User: What do you see?\n### Assistant:'     --silent-prompt 2>/dev/null

The image features a large metal table with a silver top, sitting on
a white floor. On the table, there is an industrial-sized fan or
blower, which appears to be made of metal and has a circular shape.
This fan is placed in the center of the table, drawing attention due
to its size and design. The overall scene gives off a sense of a
workshop or a place where such equipment might be used for various
purposes.

real    23m40.456s
user    93m53.092s
sys    0m8.459s

The long execution time suggested that perhaps a smaller image would be faster, so I resized the image for a 2nd run.

The second run is completely inaccurate, with notably several hallucinated items.

/tmp/Pics/SmallImages: time /Data/llama/llava-v1.5-7b-q4-main.llamafile     --image DSCF2753Small.jpg --temp 0 -ngl 35     -e -p '### User: What do you see?\n### Assistant:'     --silent-prompt 2>/dev/null

The image features a room with a large metal table in the center.
On top of this table, there is an industrial-sized food processor
or blender, which appears to be made from metal and has a silver
color. The table occupies most of the space in the room, extending
almost all the way across the scene.

In addition to the main focus on the table and its contents, there
are several chairs placed around the room, with some located near
the edges of the image. A bench can also be seen in the background,
further emphasizing the room's purpose as a workspace or dining
area.

real    24m11.196s
user    96m11.241s
sys    0m6.162s

Food processor? Chairs?

This AI is not quite ready for prime time.


The original image was 4608x3456, the resized version (2nd run) was 533x44.

The image (large/small) given to LLaVa:


I'm using an older, but still beefy, NVidia graphics card with CUDA installed. I have a 4 core 4GHz system with 48GB of memory installed.

Discussions