Skip to content

Commit

Permalink
r1.2 improvements, kind of works in Minecraft.
Browse files Browse the repository at this point in the history
  • Loading branch information
whilo committed Feb 26, 2025
1 parent 803b26c commit 25153f1
Show file tree
Hide file tree
Showing 2 changed files with 67 additions and 50 deletions.
25 changes: 14 additions & 11 deletions resources/prompts/plaicraft-short.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ You are a helpful teacher and co-player of Minecraft, playing a shared research
Your name is Yachiusu. You should act like a 12 year old boy who happens to know a great deal about Minecraft but not necessarily a great deal about the world.
You are nice and humorus and make brief direct statements.
You can act in this plaicraft environment by speaking and by directly generating mouse movements and key presses that will be acted upon in the Minecraft world.
Unfortunately you only get to look at the screen every 10 seconds or so, and you only get to act given a textual description of what you see at that.
In some sense you are blind, but you see this as a challenge to be overcome. To do this you can ask questions about the people who are playing. In the chat you may type messages that can be read by everyone.
Chatting is one of the safest actions you can take. You can also speak in the environment. You have a long history of playing and you say things like those in the
following transcript.
Expand Down Expand Up @@ -149,20 +148,24 @@ Yachiusu: No. No way. Okay, no way. Okay, that's just embarrassing. Okay, now th
Yachiusu: Oh, yeah. Alright, good game. Yeah, likewise. Where'd Yachiusugo?

OK. Use this to base your personality. Below this point you will have a growing history what you have seen, done, said, and hear in the plaicraft environment.
Using these you must produce words to be spoken and actions to be taken over a 10 to 15 second time horizon, knowing that you will only get to see the screen every 10 seconds.
It would be wise to ask those around you, if there are other players around you, what it is safe to do and what you should not do. Or to otherwise be very slow and deliberate, checking
after each action if the action you took had the intended consequence. Remember that "space" is jump. If someone says it is OK to run forward, press and hold "w"
and "space" every once in a while. The rest of Minecraft commands should be familiar to you. Provide output for the next 20 seconds.
Using these you must produce words to be spoken and actions to be taken over the next 30 second time horizon, knowing that you will only get to see the screen in these intervals. Be optimistic and emit long action sequences.

To move your head you need to move the mouse in relative pixel coordinates as shown in the example below. The screen is typically in 1920x1080 resolution and the movement in Minecraft is relative to the screen center, for the y-axis up is positive and down is negative.

Pay close attention to the incoming audio messages on "audio-in' from other users, in particular questions. You sometimes can hear your own messages from audio-out there as well, ignore those.

Reply statements and actions in the following JSON format only! Here is an example:

```json
[{"action": "statement", "text": "What?"},
[{"action": "statement", "text": "What do you think?"},
{"action": "press-keys", "keys": ["w"], "duration": 0.3},
{"action": "statement", "text": "Forward!"},
{"action": "mouse-move", "relative": [20, -5]},
{"action": "mouse-click", "button": "left"},
{"action": "press-keys", "keys": ["space" "w"], "duration": 3.2},
{"action": "statement", "text": "Upwards!"}
{"action": "statement", "text": "Forward to the right bottom:"},
{"action": "mouse-move", "relative": [230, -150]},
{"action": "mouse-click", "button": "left", "duration": 0.1},
{"action": "press-keys", "keys": ["space", "w"], "duration": 3.2},
{"action": "statement", "text": "Jumping foward! Yepee."},
{"action": "press-keys", "keys": ["space", "w"], "duration": 10.0},
{"action": "mouse-move", "relative": [500, 0]},
{"action": "press-keys", "keys": ["space", "w"], "duration": 4.0},
]
```
92 changes: 53 additions & 39 deletions src/is/simm/runtimes/ubuntu.clj
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@
[hyperfiddle.rcf :refer [tests]]
[datahike.api :as d]

[clojure.edn :as edn]
[clojure.data.json :as json]
[clojure.spec.alpha :as s]
[clojure.string :as str]
[clojure.java.io :as io]
Expand Down Expand Up @@ -268,26 +266,24 @@
(defn press-keys [keys duration]
(py/with-gil-stack-rc-context
(let [keys (map clj->ecode keys)
duration (or duration 0.1)
ui (UInput)]
(if key
(do
(doseq [key keys
:when key]
(py. ui write (py.- ecodes EV_KEY) key 1)) ;; Key press
(py. ui syn)
(time/sleep duration)
(doseq [key keys
:when key]
(py. ui write (py.- ecodes EV_KEY) key 0)) ;; Key release
(py. ui syn)
(py. ui close))
(warn "Invalid key" key))
(doseq [key keys
:when key]
(py. ui write (py.- ecodes EV_KEY) key 1)) ;; Key press
(py. ui syn)
(time/sleep duration)
(doseq [key keys
:when key]
(py. ui write (py.- ecodes EV_KEY) key 0)) ;; Key release
(py. ui syn)
(py. ui close))))

(defn mouse-move [relative]
(py/with-gil-stack-rc-context
(let [[x y] relative
ui (UInput {(py.- ecodes EV_REL) [(py.- ecodes REL_X) (py.- ecodes REL_Y)]})]
ui (UInput {(py.- ecodes EV_KEY) [(py.- ecodes BTN_LEFT) (py.- ecodes BTN_RIGHT)]
(py.- ecodes EV_REL) [(py.- ecodes REL_X) (py.- ecodes REL_Y)]})]
(if (and x y)
(do
(py. ui write (py.- ecodes EV_REL) (py.- ecodes REL_X) x)
Expand All @@ -300,6 +296,7 @@
(py/with-gil-stack-rc-context
(let [key ({:left (py.- ecodes BTN_LEFT)
:right (py.- ecodes BTN_RIGHT)} button)
duration (or duration 0.1)
ui (UInput {(py.- ecodes EV_KEY) [(py.- ecodes BTN_LEFT) (py.- ecodes BTN_RIGHT)]})]
(if key
(do
Expand All @@ -314,9 +311,25 @@
(comment
(press-keys [:a] 0.1)

(mouse-move [13 500])
(mouse-move [0 80])

(dotimes [_ 10]
(Thread/sleep 2000)
(mouse-move [0 200]))

(mouse-click :right 2.0)

(mouse-click :right-click 2.0))
(let [[x y] [200 200]
ui (UInput {(py.- ecodes EV_REL) [(py.- ecodes REL_X) (py.- ecodes REL_Y)]})]
(if (and x y)
(do
(py. ui write (py.- ecodes EV_REL) (py.- ecodes REL_X) x)
(py. ui write (py.- ecodes EV_REL) (py.- ecodes REL_Y) y)))
(py. ui syn)
(py. ui close))


)


;; ===== Agent =====
Expand Down Expand Up @@ -426,28 +439,29 @@
events))


(s/def ::actions (s/coll-of (s/or :statement (s/keys :req-un [::action ::text]
:opt-un [::duration])
:press-keys (s/keys :req-un [::action ::keys])
:mouse-move (s/keys :req-un [::action ::relative])
:mouse-click (s/keys :req-un [::action ::button]
:opt-un [::duration]))))

(s/def ::actions (s/coll-of (s/or :key (s/keys :req-un [::key ::duration])
:mouse (s/keys :req-un [::mouse]))))

(defn parse-spec [input spec default]
(try
(let [p (second (.split input "```clojure"))
p (first (.split p "```"))
p (edn/read-string p)]
(if (s/valid? spec p) p default))
(catch Exception _ default)))
(s/def ::action string?)
(s/def ::text string?)
(s/def ::keys (s/coll-of string?))
(s/def ::duration pos?)
(s/def ::button string?)
(s/def ::relative (s/tuple number? number?))

(defn parse-json [input default]
(try
(let [p (second (.split input "```json"))
p (first (.split p "```"))
p (json/read-str p)]
p)
p (json/read-str p :key-fn keyword)]
(if (s/valid? ::actions p)
p default))
(catch Exception _ default)))



(defn baseline-0 [conn]
(m/race (audio-listen (:microphone audio-devices)
10
Expand Down Expand Up @@ -528,7 +542,8 @@
(d/q '[:find (pull ?e [:*]) :where [?e :event/created ?c]])
(map first)
(sort-by :event/created)
reverse)
reverse
events->openai-messages)


(def baseline-0-test (baseline-0 conn))
Expand Down Expand Up @@ -559,7 +574,7 @@
:event/created (java.util.Date.)
:event/role "developer"}]))

(screen-watch 10
#_(screen-watch 10
#(d/transact! conn [{:screen/file %1
:screen/transcript %2
:event/created (java.util.Date.)
Expand All @@ -569,17 +584,18 @@
(loop []
(debug "talk & action loop")
(let [system-prompt (slurp (io/resource "prompts/plaicraft-short.txt"))
last-screen (->> @conn
#_#_last-screen (->> @conn
(d/q '[:find ?s ?c :where [?e :screen/file ?s] [?e :event/created ?c]])
(sort-by second)
reverse
first ;; newest
first)
last-screen (str "/tmp/screenshot-" (rand-int 1000000) ".png")
_ (m/? (screenshot last-screen))
messages (->> @conn
(d/q '[:find (pull ?e [:*]) :where [?e :event/created ?c]])
(map first)
(sort-by :event/created)
reverse
events->openai-messages)]
(debug "last-screen" last-screen)
(debug "messages" messages)
Expand All @@ -599,7 +615,7 @@
(d/transact! conn [{:assistant/output output
:event/created (java.util.Date.)
:event/role "developer"}])
(doseq [{:strs [action keys duration button relative text]} actions]
(doseq [{:keys [action keys duration button relative text]} actions]
(case action
"statement" (m/? (play-audio (m/? (<! (tts-1 text)))))
"press-keys" (press-keys (map keyword keys) duration)
Expand Down Expand Up @@ -645,5 +661,3 @@
(defn fastest [& args]
(m/absolve (apply m/race (map m/attempt args))))


(comment)

0 comments on commit 25153f1

Please sign in to comment.