Compare commits

..

105 Commits

Author SHA1 Message Date
bunnei
10118c71e0 memory: Simplify rasterizer cache operations. 2019-03-16 00:41:08 -04:00
bunnei
47b622825c Merge pull request #2237 from bunnei/cache-host-addr
gpu: Use host address for caching instead of guest address.
2019-03-16 00:05:24 -04:00
bunnei
06ac6460d3 Merge pull request #2048 from FearlessTobi/port-3924
Port citra-emu/citra#3924: "citra_qt: Settings (configuration) rework"
2019-03-15 22:23:38 -04:00
bunnei
2eaf6c41a4 gpu: Use host address for caching instead of guest address. 2019-03-14 22:34:42 -04:00
bunnei
84d3cdf7d7 Merge pull request #2233 from ReinUsesLisp/morton-cleanup
video_core/morton: Miscellaneous changes
2019-03-14 21:23:12 -04:00
bunnei
6788ebffc8 Merge pull request #2229 from ReinUsesLisp/vk-sampler-cache
vk_sampler_cache: Implement a sampler cache
2019-03-14 21:22:34 -04:00
bunnei
2d9546848e Merge pull request #2230 from lioncash/global
kernel/process: Remove use of global system accessors
2019-03-14 20:42:46 -04:00
bunnei
8bd17aa044 Merge pull request #2216 from ReinUsesLisp/rasterizer-system
gl_rasterizer: Use system instance passed from argument
2019-03-14 16:37:05 -04:00
bunnei
4e6c667586 Merge pull request #2227 from lioncash/override
renderer_opengl/gl_global_cache: Add missing override specifiers
2019-03-13 17:05:49 -04:00
ReinUsesLisp
ffe2e50458 video_core/morton: Use enum to describe MortonCopyPixels128 mode 2019-03-13 16:35:21 -03:00
ReinUsesLisp
6ed6129b4f video_core/morton: Remove unused parameter in MortonSwizzle 2019-03-13 16:35:10 -03:00
ReinUsesLisp
9030a8259f video_core/morton: Remove clang-format off when it's not needed 2019-03-13 16:16:45 -03:00
ReinUsesLisp
fdf76a25ab video_core/morton: Remove unused functions 2019-03-13 16:15:54 -03:00
bunnei
e7850a7f11 Merge pull request #2226 from lioncash/private
kernel/server_port: Make data members private
2019-03-13 14:44:21 -04:00
bunnei
c1ea6a39a0 Merge pull request #2223 from lioncash/error
core/hle/result: Tidy up the base error code result header.
2019-03-13 14:43:14 -04:00
bunnei
0a923b4ab3 Merge pull request #2187 from FearlessTobi/port-sdl-things
Port various Citra changes to input_common, including deadzone support
2019-03-13 11:46:57 -04:00
bunnei
e8a21f5276 Merge pull request #2166 from lioncash/vi-init-service
service/vi: Unstub GetDisplayService
2019-03-13 10:01:54 -04:00
bunnei
71c4e876ef Merge pull request #2231 from ReinUsesLisp/fixup-bias
video_core/texture: Fix up sampler lod bias
2019-03-13 09:58:58 -04:00
ReinUsesLisp
a63295a872 video_core/texture: Fix up sampler lod bias 2019-03-13 00:45:54 -03:00
Mat M
a3734d7e31 vk_sampler_cache: Use operator== instead of memcmp
Co-Authored-By: ReinUsesLisp <reinuseslisp@airmail.cc>
2019-03-12 21:05:36 -03:00
ReinUsesLisp
aa59d77c3b vk_sampler_cache: Implement a sampler cache 2019-03-12 20:20:57 -03:00
Lioncash
6eddb60db0 kernel/process: Remove use of global system accessors
Now that we pass in a reference to the system instance, we can utilize
it to eliminate the global accessors in Process-related code.
2019-03-12 19:03:28 -04:00
bunnei
3bfd199497 Merge pull request #2211 from lioncash/arbiter
kernel: Make the address arbiter instance per-process
2019-03-12 17:54:48 -04:00
bunnei
0f6dd7b64e Merge pull request #2222 from lioncash/cstr
service/service: Remove unncessary calls to c_str()
2019-03-12 17:54:20 -04:00
ReinUsesLisp
8ebeb9ade2 video_core/texture: Add a raw representation of TSCEntry 2019-03-12 16:56:29 -03:00
bunnei
2ad44a453f Merge pull request #2215 from ReinUsesLisp/samplers
gl_rasterizer: Encapsulate sampler queries into methods
2019-03-12 13:10:53 -04:00
Lioncash
3350c0a779 renderer_opengl/gl_global_cache: Replace indexing for assignment with insert_or_assign
The previous code had some minor issues with it, really not a big deal,
but amending it is basically 'free', so I figured, "why not?".

With the standard container maps, when:

map[key] = thing;

is done, this can cause potentially undesirable behavior in certain
scenarios. In particular, if there's no value associated with the key,
then the map constructs a default initialized instance of the value
type.

In this case, since it's a std::shared_ptr (as a type alias) that is
the value type, this will construct a std::shared_pointer, and then
assign over it (with objects that are quite large, or actively heap
allocate this can be extremely undesirable).

We also make the function take the region by value, as we can avoid a
copy (and by extension with std::shared_ptr, a copy causes an atomic
reference count increment), in certain scenarios when ownership isn't a
concern (i.e. when ReserveGlobalRegion is called with an rvalue
reference, then no copy at all occurs). So, it's more-or-less a "free"
gain without many downsides.
2019-03-11 12:20:35 -04:00
Lioncash
1070c020db renderer_opengl/gl_global_cache: Append missing override specifiers
Two of the functions here are overridden functions, so we can append
these specifiers to make it explicit.
2019-03-11 12:02:30 -04:00
Lioncash
aa44eb639b kernel/server_port: Make data members private
With this, all kernel objects finally have all of their data members
behind an interface, making it nicer to reason about interactions with
other code (as external code no longer has the freedom to totally alter
internals and potentially messing up invariants).
2019-03-11 10:41:05 -04:00
ReinUsesLisp
a6c048920e gl_rasterizer: Use system instance passed from argument 2019-03-11 03:17:21 -03:00
Lioncash
0c28ab92e6 core/hle/result: Remove now-unnecessary manually defined copy assignment operator
Previously this was required, as BitField wasn't trivially copyable.
BitField has since been made trivially copyable, so now this isn't
required anymore.
2019-03-10 18:34:20 -04:00
Lioncash
3f602dde0f core/hle/result: Amend error in comment description for ResultCode
Gets rid of another holdover from Citra, and describes the OS on the
Switch instead.
2019-03-10 18:29:31 -04:00
Lioncash
f7ec0bcfc2 core/hle/result: Remove now-unused constructor for ResultCode
Now that the final stray ErrorDescription member was relocated, we can
finally remove it and its relevant constructor in the ResultCode union.
2019-03-10 18:26:12 -04:00
Lioncash
d870cc5ad7 core/hle/result: Relocate IPC error code to ipc_helpers
Relocates the error code to where it's most related, similar to how all
the other error codes are. Previously we were including a non-generic
error in the main result code header.
2019-03-10 18:23:42 -04:00
Lioncash
8603f0f9b1 service/service: Remove unncessary calls to c_str()
These can just be passed regularly, now that we use fmt instead of our
old logging system.

While we're at it, make the parameters to MakeFunctionString
std::string_views.
2019-03-10 18:00:57 -04:00
bunnei
0aa824b12f Merge pull request #2207 from lioncash/hwopus
service/audio/hwopus: Move decoder state to its own class
2019-03-10 17:32:39 -04:00
bunnei
037d9bdde3 Merge pull request #2193 from lioncash/global
kernel/scheduler: Pass in system instance in constructor
2019-03-10 17:29:01 -04:00
bunnei
633ce92908 Merge pull request #2147 from ReinUsesLisp/texture-clean
shader_ir: Remove "extras" from the MetaTexture
2019-03-10 17:28:36 -04:00
bunnei
4a84921b31 Merge pull request #2143 from ReinUsesLisp/texview
gl_rasterizer_cache: Create texture views for array discrepancies
2019-03-10 17:27:49 -04:00
bunnei
add8b1df68 Merge pull request #2220 from lioncash/cubeb
audio_core/cubeb_sink: Convert _MSC_VER ifdefs to _WIN32
2019-03-10 17:26:20 -04:00
Mat M
0ea2771889 Merge pull request #2217 from ReinUsesLisp/rasterizer-logger
gl_rasterizer: Minor logger changes
2019-03-10 03:16:00 -04:00
Mat M
9ae680c639 Merge pull request #2219 from Hexagon12/log-settings
core/settings: Log more setting values
2019-03-10 03:15:01 -04:00
Mat M
46fdf8c819 Merge pull request #2218 from ReinUsesLisp/cmd-cast
yuzu_cmd/config: Silent implicit cast warning
2019-03-10 03:14:34 -04:00
Lioncash
4a4e87e971 audio_core/cubeb_sink: Convert _MSC_VER ifdefs to _WIN32
This behavior also needs to be visible for MinGW builds as well.
2019-03-09 18:06:23 -05:00
Hexagon12
e6f652ae12 clang fix 2019-03-09 16:42:56 +02:00
Hexagon12
6ce8de4b5f Log 2 new setting values 2019-03-09 14:58:15 +02:00
ReinUsesLisp
a0be7b3b92 gl_rasterizer: Encapsulate sampler queries into methods 2019-03-09 04:35:57 -03:00
ReinUsesLisp
45ef421b6b yuzu_cmd/config: Replace C casts with static_cast 2019-03-09 03:59:23 -03:00
ReinUsesLisp
fedef7bda3 yuzu_cmd/config: Silent implicit cast warning 2019-03-09 03:58:20 -03:00
ReinUsesLisp
6ee0ba64c8 gl_rasterizer: Minor logger changes 2019-03-09 03:34:49 -03:00
bunnei
9909d40530 Merge pull request #2210 from lioncash/optional
kernel/hle_ipc: Convert std::shared_ptr IPC header instances to std::optional
2019-03-08 16:35:57 -05:00
bunnei
160fc63c72 Merge pull request #2209 from lioncash/reorder
video_core/gpu_thread: Silence a -Wreorder warning
2019-03-08 12:04:26 -05:00
bunnei
78c803b4f3 Merge pull request #2208 from lioncash/gpu
video_core/gpu: Make GPU's destructor virtual
2019-03-08 12:03:58 -05:00
bunnei
1143923cdd Merge pull request #2191 from ReinUsesLisp/maxwell-to-vk
maxwell_to_vk: Initial implementation
2019-03-08 11:51:08 -05:00
bunnei
d10dffed44 Merge pull request #2212 from ReinUsesLisp/dma-push-fix
dma_pusher: Store command_list_header by copy
2019-03-08 11:48:32 -05:00
ReinUsesLisp
e7ac5a6adf dma_pusher: Store command_list_header by copy
Instead of holding a reference that will get invalidated by
dma_pushbuffer.pop(), hold it as a copy. This doesn't have any
performance cost since CommandListHeader is 8 bytes long.
2019-03-08 04:06:54 -03:00
Lioncash
fbb82e61e3 kernel/hle_ipc: Convert std::shared_ptr IPC header instances to std::optional
There's no real need to use a shared lifetime here, since we don't
actually expose them to anything else. This is also kind of an
unnecessary use of the heap given the objects themselves are so small;
small enough, in fact that changing over to optionals actually reduces
the overall size of the HLERequestContext struct (818 bytes to 808
bytes).
2019-03-07 23:34:37 -05:00
Lioncash
69749a88cd travis: Bump macOS version to 10.14
For whatever bizarre reason, Apple only made a few of std::optional's
member functions available on newer SDK versions. Given we can't even
run yuzu on macOS, and we keep the builder around to ensure that it
always at least compiles on macOS, we can bump this up a version.
2019-03-07 23:34:37 -05:00
Lioncash
8e510d5afa kernel: Make the address arbiter instance per-process
Now that we have the address arbiter extracted to its own class, we can
fix an innaccuracy with the kernel. Said inaccuracy being that there
isn't only one address arbiter. Each process instance contains its own
AddressArbiter instance in the actual kernel.

This fixes that and gets rid of another long-standing issue that could
arise when attempting to create more than one process.
2019-03-07 23:27:51 -05:00
Lioncash
b7f331afa3 kernel/svc: Move address arbiter signaling behind a unified API function
Similar to how WaitForAddress was isolated to its own function, we can
also move the necessary conditional checking into the address arbiter
class itself, allowing us to hide the implementation details of it from
public use.
2019-03-07 23:27:47 -05:00
Lioncash
0209de123b kernel/svc: Move address arbiter waiting behind a unified API function
Rather than let the service call itself work out which function is the
proper one to call, we can make that a behavior of the arbiter itself,
so we don't need to directly expose those implementation details.
2019-03-07 23:27:20 -05:00
bunnei
d26ee6e01e Merge pull request #2195 from lioncash/shared-global
kernel/shared_memory: Get rid of the use of global accessor functions within Create()
2019-03-07 17:26:11 -05:00
Lioncash
e99a148628 common/bit_field: Make BitField trivially copyable
This makes the class much more flexible and doesn't make performing
copies with classes that contain a bitfield member a pain.

Given BitField instances are only intended to be used within unions, the
fact the full storage value would be copied isn't a big concern (only
sizeof(union_type) would be copied anyways).

While we're at it, provide defaulted move constructors for consistency.
2019-03-07 17:05:44 -05:00
Lioncash
c2d4c8b95e video_core/gpu_thread: Remove unimplemented WaitForIdle function prototype
This function didn't have a definition, so we can remove it to prevent
accidentally attempting to use it.
2019-03-07 16:08:52 -05:00
Lioncash
48a461a629 video_core/gpu_thread: Amend constructor initializer list order
Moves the data members to satisfy the order they're declared as in the
constructor initializer list.

Silences a -Wreorder warning.
2019-03-07 16:05:49 -05:00
Lioncash
24e2e601d5 video_core/gpu: Make GPU's destructor virtual
Because of the recent separation of GPU functionality into sync/async
variants, we need to mark the destructor virtual to provide proper
destruction behavior, given we use the base class within the System
class.

Prior to this, it was undefined behavior whether or not the destructor
in the derived classes would ever execute.
2019-03-07 15:59:45 -05:00
bunnei
3b63a46ca4 Merge pull request #2196 from DarkLordZach/web-applet-esc
web_browser: Add shortcut to Enter key to exit applet
2019-03-07 15:32:32 -05:00
bunnei
c63a0e88b7 Merge pull request #2202 from lioncash/port-priv
kernel/client_session, kernel/server_session: Make data members private
2019-03-07 15:31:26 -05:00
bunnei
1a4d733ec7 Merge pull request #2205 from FearlessTobi/docked-undocked-hotkey
yuzu: add a hotkey to switch between undocked and docked mode
2019-03-07 11:33:24 -05:00
zhupengfei
39e895c5ff citra_qt: Settings (configuration) rework 2019-03-07 16:55:50 +01:00
bunnei
d9e9e71aec Merge pull request #2206 from lioncash/audio-stop
service/audio/audout_u: Only actually stop the audio stream in StopAudioOut if the stream is playing
2019-03-07 10:47:59 -05:00
bunnei
4f352833a5 Merge pull request #2055 from bunnei/gpu-thread
Asynchronous GPU command processing
2019-03-07 10:41:53 -05:00
Lioncash
d03ae881fd service/audio/hwopus: Move decoder state to its own class
Moves the non-multistream specific state to its own class. This will be
necessary to support the multistream variants of opus decoding.
2019-03-07 07:47:09 -05:00
Lioncash
960057cba0 service/audio/hwopus: Provide a name for the second word of OpusPacketHeader
This indicates the entropy coder's final range.
2019-03-07 05:48:35 -05:00
Lioncash
d41d85766f service/audio/hwopus: Move Opus packet header out of the IHardwareOpusDecoderManager
This will be utilized by more than just that class in the future. This
also renames it from OpusHeader to OpusPacketHeader to be more specific
about what kind of header it is.
2019-03-07 05:37:08 -05:00
Lioncash
3293877456 service/audio/hwopus: Enclose internals in an anonymous namespace
Makes it impossible to violate the ODR, as well as providing a place for
future changes.
2019-03-07 05:32:42 -05:00
Lioncash
64e7524f36 service/audio/audout_u: Only actually stop the audio stream in StopAudioOut if the stream is playing
The service itself only does further actions if the stream is playing.
If the stream is already stopped, then it just exits successfully.
2019-03-07 03:39:01 -05:00
bunnei
076c76f4e4 Merge pull request #2149 from ReinUsesLisp/decoders-style
gl_rasterizer_cache: Move format conversion functions to their own file
2019-03-06 21:56:20 -05:00
bunnei
ed0bdcc638 Merge pull request #2197 from lioncash/include
core/hle/ipc: Remove unnecessary includes
2019-03-06 21:55:16 -05:00
bunnei
d2ff93c319 Merge pull request #2190 from lioncash/ogl-global
core: Remove the global telemetry accessor function
2019-03-06 21:41:53 -05:00
fearlessTobi
c8d6f0cb82 yuzu: add a hotkey to switch between undocked and docked mode 2019-03-06 19:31:23 +01:00
Lioncash
221613d4ea kernel/server_session: Make data members private
Makes it much nicer to locally reason about server session behavior, as
part of its functionality isn't placed around other classes.
2019-03-05 20:10:07 -05:00
Lioncash
7526b6fce3 kernel/client_session: Make data members private
These can be made private, as they aren't accessed in contexts that
require them to be public.
2019-03-05 20:10:03 -05:00
Lioncash
02bc9e9de1 core/hle/ipc: Remove unnecessary includes
Removes a few inclusion dependencies from the headers or replaces
existing ones with ones that don't indirectly include the required
headers.

This allows removing an inclusion of core/memory.h, meaning that if the
memory header is ever changed in the future, it won't result in
rebuilding the entirety of the HLE services (as the IPC headers are used
quite ubiquitously throughout the HLE service implementations).
2019-03-05 09:53:38 -05:00
Zach Hilman
4130b07f88 web_browser: Add shortcut to Enter key to exit applet
Addresses issues where a user in fullscreen could not exit some web applets without leaving fullscreen.
2019-03-04 18:26:28 -05:00
Lioncash
fad20213e6 kernel/scheduler: Pass in system instance in constructor
Avoids directly relying on the global system instance and instead makes
an arbitrary system instance an explicit dependency on construction.

This also allows removing dependencies on some global accessor functions
as well.
2019-03-04 17:01:37 -05:00
Lioncash
f59040d752 kernel/shared_memory: Get rid of the use of global accessor functions within Create()
Given we already pass in a reference to the kernel that the shared
memory instance is created under, we can just use that to check the
current process, rather than using the global accessor functions.

This allows removing direct dependency on the system instance entirely.
2019-03-04 16:52:36 -05:00
Lioncash
b114928459 core/core: Remove the global telemetry accessor function
With all usages converted off of it, this function can be removed.
2019-03-04 10:24:13 -05:00
Lioncash
319365fdf0 yuzu: Remove usage of the global telemetry accessor
In these cases the system object is nearby, and in the other, the
long-form of accessing the telemetry instance is already used, so we can
get rid of the use of the global accessor.
2019-03-04 10:24:13 -05:00
Lioncash
697a4669e1 yuzu-cmd/yuzu: Replace direct usage of the global system telemetry accessor in main()
We already have the system instance around, so we can use that instead
of the accessor.
2019-03-04 10:24:13 -05:00
Lioncash
b5f0dc95db core/core: Replace direct usage of the global system telemetry accessor from Shutdown()
The telemetry instance is actually a member of the class itself, so we
can access it directly instead of going through the global accessor.
2019-03-04 10:24:13 -05:00
Lioncash
90febaf717 video_core/renderer_opengl: Replace direct usage of global system object accessors
We already pass a reference to the system object to the constructor of the renderer,
so we can just use that instead of using the global accessor functions.
2019-03-04 10:24:09 -05:00
ReinUsesLisp
1f6571b3de maxwell_to_vk: Initial implementation 2019-03-04 04:06:05 -03:00
B3n30
71817afbe9 fixup! Joystick: Allow for background events; Add deadzone to SDLAnalog 2019-03-02 19:12:46 +01:00
Weiyi Wang
8b98f60e3c input/sdl: lock map mutex after SDL call
Any SDL invocation can call the even callback on the same thread, which can call GetSDLJoystickBySDLID and eventually cause double lock on joystick_map_mutex. To avoid this, lock guard should be placed as closer as possible to the object accessing code, so that any SDL invocation is with the mutex unlocked
2019-03-02 19:09:58 +01:00
James Rowe
09ac66388c Input: Remove global variables from SDL Input
Changes the interface as well to remove any unique methods that
frontends needed to call such as StartJoystickEventHandler by
conditionally starting the polling thread only if the frontend hasn't
started it already. Additionally, moves all global state into a single
SDLState class in order to guarantee that the destructors are called in
the proper order
2019-03-02 19:09:34 +01:00
James Rowe
c8554d218b Input: Copy current SDL.h/cpp files to impl
This should make reviewing much easier as you can then see what changed
happened between the old file and the new one
2019-03-02 18:38:11 +01:00
ReinUsesLisp
27ddbeb01c gl_rasterizer_cache: Create texture views for array discrepancies
When a texture is sampled in a shader with a different array mode than
the cached state, create a texture view and bind that to the shader
instead.
2019-02-27 14:41:06 -03:00
Lioncash
92ea1c32d6 service/vi: Unstub GetDisplayService
This function is also supposed to check its given policy type with the
permission of the service itself. This implements the necessary
machinery to unstub these functions.

Policy::User seems to just be basic access (which is probably why vi:u
is restricted to that policy), while the other policy seems to be for
extended abilities regarding which displays can be managed and queried,
so this is assumed to be for a background compositor (which I've named,
appropriately, Policy::Compositor).
2019-02-26 20:16:23 -05:00
Lioncash
254b1e3df7 core/ipc_helper: Allow popping all signed value types with RequestParser
There's no real reason this shouldn't be allowed, given some values sent
via a request can be signed. This also makes it less annoying to work
with popping enum values, given an enum class with no type specifier
will work out of the box now.

It's also kind of an oversight to allow popping s64 values, but nothing
else.
2019-02-26 18:10:36 -05:00
ReinUsesLisp
0ad3c031f4 gl_rasterizer_cache: Move format conversion to its own file 2019-02-26 20:08:27 -03:00
ReinUsesLisp
0ccd490fcd decoders: Minor style changes 2019-02-26 20:08:27 -03:00
Lioncash
1b2872eebc service/vi: Remove use of a module class
This didn't really provide much benefit here, especially since the
subsequent change requires that the behavior for each service's
GetDisplayService differs in a minor detail.

This also arguably makes the services nicer to read, since it gets rid
of an indirection in the class hierarchy.
2019-02-26 17:44:03 -05:00
ReinUsesLisp
5ca63d0675 shader/decode: Remove extras from MetaTexture 2019-02-26 00:11:30 -03:00
ReinUsesLisp
48e6f77c03 shader/decode: Split memory and texture instructions decoding 2019-02-26 00:11:30 -03:00
118 changed files with 3870 additions and 2675 deletions

View File

@@ -24,7 +24,7 @@ matrix:
- os: osx
env: NAME="macos build"
sudo: false
osx_image: xcode10
osx_image: xcode10.1
install: "./.travis/macos/deps.sh"
script: "./.travis/macos/build.sh"
after_success: "./.travis/macos/upload.sh"

View File

@@ -2,7 +2,7 @@
set -o pipefail
export MACOSX_DEPLOYMENT_TARGET=10.13
export MACOSX_DEPLOYMENT_TARGET=10.14
export Qt5_DIR=$(brew --prefix)/opt/qt5
export UNICORNDIR=$(pwd)/externals/unicorn
export PATH="/usr/local/opt/ccache/libexec:$PATH"

View File

@@ -73,6 +73,7 @@ set(HASH_FILES
"${VIDEO_CORE}/shader/decode/integer_set.cpp"
"${VIDEO_CORE}/shader/decode/integer_set_predicate.cpp"
"${VIDEO_CORE}/shader/decode/memory.cpp"
"${VIDEO_CORE}/shader/decode/texture.cpp"
"${VIDEO_CORE}/shader/decode/other.cpp"
"${VIDEO_CORE}/shader/decode/predicate_set_predicate.cpp"
"${VIDEO_CORE}/shader/decode/predicate_set_register.cpp"

View File

@@ -12,7 +12,7 @@
#include "common/ring_buffer.h"
#include "core/settings.h"
#ifdef _MSC_VER
#ifdef _WIN32
#include <objbase.h>
#endif
@@ -113,7 +113,7 @@ private:
CubebSink::CubebSink(std::string_view target_device_name) {
// Cubeb requires COM to be initialized on the thread calling cubeb_init on Windows
#ifdef _MSC_VER
#ifdef _WIN32
com_init_result = CoInitializeEx(nullptr, COINIT_MULTITHREADED);
#endif
@@ -152,7 +152,7 @@ CubebSink::~CubebSink() {
cubeb_destroy(ctx);
#ifdef _MSC_VER
#ifdef _WIN32
if (SUCCEEDED(com_init_result)) {
CoUninitialize();
}

View File

@@ -26,7 +26,7 @@ private:
cubeb_devid output_device{};
std::vector<SinkStreamPtr> sink_streams;
#ifdef _MSC_VER
#ifdef _WIN32
u32 com_init_result = 0;
#endif
};

View File

@@ -47,6 +47,7 @@ add_custom_command(OUTPUT scm_rev.cpp
"${VIDEO_CORE}/shader/decode/integer_set.cpp"
"${VIDEO_CORE}/shader/decode/integer_set_predicate.cpp"
"${VIDEO_CORE}/shader/decode/memory.cpp"
"${VIDEO_CORE}/shader/decode/texture.cpp"
"${VIDEO_CORE}/shader/decode/other.cpp"
"${VIDEO_CORE}/shader/decode/predicate_set_predicate.cpp"
"${VIDEO_CORE}/shader/decode/predicate_set_register.cpp"

View File

@@ -111,12 +111,6 @@
template <std::size_t Position, std::size_t Bits, typename T>
struct BitField {
private:
// We hide the copy assigment operator here, because the default copy
// assignment would copy the full storage value, rather than just the bits
// relevant to this particular bit field.
// We don't delete it because we want BitField to be trivially copyable.
constexpr BitField& operator=(const BitField&) = default;
// UnderlyingType is T for non-enum types and the underlying type of T if
// T is an enumeration. Note that T is wrapped within an enable_if in the
// former case to workaround compile errors which arise when using
@@ -163,9 +157,13 @@ public:
BitField(T val) = delete;
BitField& operator=(T val) = delete;
// Force default constructor to be created
// so that we can use this within unions
constexpr BitField() = default;
constexpr BitField() noexcept = default;
constexpr BitField(const BitField&) noexcept = default;
constexpr BitField& operator=(const BitField&) noexcept = default;
constexpr BitField(BitField&&) noexcept = default;
constexpr BitField& operator=(BitField&&) noexcept = default;
constexpr FORCE_INLINE operator T() const {
return Value();

View File

@@ -116,7 +116,7 @@ struct System::Impl {
if (web_browser == nullptr)
web_browser = std::make_unique<Core::Frontend::DefaultWebBrowserApplet>();
auto main_process = Kernel::Process::Create(kernel, "main");
auto main_process = Kernel::Process::Create(system, "main");
kernel.MakeCurrentProcess(main_process.get());
telemetry_session = std::make_unique<Core::TelemetrySession>();
@@ -190,13 +190,13 @@ struct System::Impl {
void Shutdown() {
// Log last frame performance stats
auto perf_results = GetAndResetPerfStats();
Telemetry().AddField(Telemetry::FieldType::Performance, "Shutdown_EmulationSpeed",
perf_results.emulation_speed * 100.0);
Telemetry().AddField(Telemetry::FieldType::Performance, "Shutdown_Framerate",
perf_results.game_fps);
Telemetry().AddField(Telemetry::FieldType::Performance, "Shutdown_Frametime",
perf_results.frametime * 1000.0);
const auto perf_results = GetAndResetPerfStats();
telemetry_session->AddField(Telemetry::FieldType::Performance, "Shutdown_EmulationSpeed",
perf_results.emulation_speed * 100.0);
telemetry_session->AddField(Telemetry::FieldType::Performance, "Shutdown_Framerate",
perf_results.game_fps);
telemetry_session->AddField(Telemetry::FieldType::Performance, "Shutdown_Frametime",
perf_results.frametime * 1000.0);
is_powered_on = false;

View File

@@ -293,10 +293,6 @@ inline ARM_Interface& CurrentArmInterface() {
return System::GetInstance().CurrentArmInterface();
}
inline TelemetrySession& Telemetry() {
return System::GetInstance().TelemetrySession();
}
inline Kernel::Process* CurrentProcess() {
return System::GetInstance().CurrentProcess();
}

View File

@@ -11,6 +11,7 @@
#endif
#include "core/arm/exclusive_monitor.h"
#include "core/arm/unicorn/arm_unicorn.h"
#include "core/core.h"
#include "core/core_cpu.h"
#include "core/core_timing.h"
#include "core/hle/kernel/scheduler.h"
@@ -49,9 +50,9 @@ bool CpuBarrier::Rendezvous() {
return false;
}
Cpu::Cpu(Timing::CoreTiming& core_timing, ExclusiveMonitor& exclusive_monitor,
CpuBarrier& cpu_barrier, std::size_t core_index)
: cpu_barrier{cpu_barrier}, core_timing{core_timing}, core_index{core_index} {
Cpu::Cpu(System& system, ExclusiveMonitor& exclusive_monitor, CpuBarrier& cpu_barrier,
std::size_t core_index)
: cpu_barrier{cpu_barrier}, core_timing{system.CoreTiming()}, core_index{core_index} {
if (Settings::values.use_cpu_jit) {
#ifdef ARCHITECTURE_x86_64
arm_interface = std::make_unique<ARM_Dynarmic>(core_timing, exclusive_monitor, core_index);
@@ -63,7 +64,7 @@ Cpu::Cpu(Timing::CoreTiming& core_timing, ExclusiveMonitor& exclusive_monitor,
arm_interface = std::make_unique<ARM_Unicorn>(core_timing);
}
scheduler = std::make_unique<Kernel::Scheduler>(*arm_interface);
scheduler = std::make_unique<Kernel::Scheduler>(system, *arm_interface);
}
Cpu::~Cpu() = default;

View File

@@ -15,6 +15,10 @@ namespace Kernel {
class Scheduler;
}
namespace Core {
class System;
}
namespace Core::Timing {
class CoreTiming;
}
@@ -45,8 +49,8 @@ private:
class Cpu {
public:
Cpu(Timing::CoreTiming& core_timing, ExclusiveMonitor& exclusive_monitor,
CpuBarrier& cpu_barrier, std::size_t core_index);
Cpu(System& system, ExclusiveMonitor& exclusive_monitor, CpuBarrier& cpu_barrier,
std::size_t core_index);
~Cpu();
void RunLoop(bool tight_loop = true);

View File

@@ -27,8 +27,7 @@ void CpuCoreManager::Initialize(System& system) {
exclusive_monitor = Cpu::MakeExclusiveMonitor(cores.size());
for (std::size_t index = 0; index < cores.size(); ++index) {
cores[index] =
std::make_unique<Cpu>(system.CoreTiming(), *exclusive_monitor, *barrier, index);
cores[index] = std::make_unique<Cpu>(system, *exclusive_monitor, *barrier, index);
}
// Create threads for CPU cores 1-3, and build thread_to_cpu map

View File

@@ -4,10 +4,10 @@
#pragma once
#include "common/bit_field.h"
#include "common/common_funcs.h"
#include "common/common_types.h"
#include "common/swap.h"
#include "core/hle/kernel/errors.h"
#include "core/memory.h"
namespace IPC {

View File

@@ -19,9 +19,12 @@
#include "core/hle/kernel/hle_ipc.h"
#include "core/hle/kernel/object.h"
#include "core/hle/kernel/server_session.h"
#include "core/hle/result.h"
namespace IPC {
constexpr ResultCode ERR_REMOTE_PROCESS_DEAD{ErrorModule::HIPC, 301};
class RequestHelperBase {
protected:
Kernel::HLERequestContext* context = nullptr;
@@ -350,7 +353,7 @@ public:
template <class T>
std::shared_ptr<T> PopIpcInterface() {
ASSERT(context->Session()->IsDomain());
ASSERT(context->GetDomainMessageHeader()->input_object_count > 0);
ASSERT(context->GetDomainMessageHeader().input_object_count > 0);
return context->GetDomainRequestHandler<T>(Pop<u32>() - 1);
}
};
@@ -362,6 +365,11 @@ inline u32 RequestParser::Pop() {
return cmdbuf[index++];
}
template <>
inline s32 RequestParser::Pop() {
return static_cast<s32>(Pop<u32>());
}
template <typename T>
void RequestParser::PopRaw(T& value) {
std::memcpy(&value, cmdbuf + index, sizeof(T));
@@ -392,6 +400,16 @@ inline u64 RequestParser::Pop() {
return msw << 32 | lsw;
}
template <>
inline s8 RequestParser::Pop() {
return static_cast<s8>(Pop<u8>());
}
template <>
inline s16 RequestParser::Pop() {
return static_cast<s16>(Pop<u16>());
}
template <>
inline s64 RequestParser::Pop() {
return static_cast<s64>(Pop<u64>());

View File

@@ -42,7 +42,21 @@ void WakeThreads(const std::vector<SharedPtr<Thread>>& waiting_threads, s32 num_
AddressArbiter::AddressArbiter(Core::System& system) : system{system} {}
AddressArbiter::~AddressArbiter() = default;
ResultCode AddressArbiter::SignalToAddress(VAddr address, s32 num_to_wake) {
ResultCode AddressArbiter::SignalToAddress(VAddr address, SignalType type, s32 value,
s32 num_to_wake) {
switch (type) {
case SignalType::Signal:
return SignalToAddressOnly(address, num_to_wake);
case SignalType::IncrementAndSignalIfEqual:
return IncrementAndSignalToAddressIfEqual(address, value, num_to_wake);
case SignalType::ModifyByWaitingCountAndSignalIfEqual:
return ModifyByWaitingCountAndSignalToAddressIfEqual(address, value, num_to_wake);
default:
return ERR_INVALID_ENUM_VALUE;
}
}
ResultCode AddressArbiter::SignalToAddressOnly(VAddr address, s32 num_to_wake) {
const std::vector<SharedPtr<Thread>> waiting_threads = GetThreadsWaitingOnAddress(address);
WakeThreads(waiting_threads, num_to_wake);
return RESULT_SUCCESS;
@@ -60,7 +74,7 @@ ResultCode AddressArbiter::IncrementAndSignalToAddressIfEqual(VAddr address, s32
}
Memory::Write32(address, static_cast<u32>(value + 1));
return SignalToAddress(address, num_to_wake);
return SignalToAddressOnly(address, num_to_wake);
}
ResultCode AddressArbiter::ModifyByWaitingCountAndSignalToAddressIfEqual(VAddr address, s32 value,
@@ -92,6 +106,20 @@ ResultCode AddressArbiter::ModifyByWaitingCountAndSignalToAddressIfEqual(VAddr a
return RESULT_SUCCESS;
}
ResultCode AddressArbiter::WaitForAddress(VAddr address, ArbitrationType type, s32 value,
s64 timeout_ns) {
switch (type) {
case ArbitrationType::WaitIfLessThan:
return WaitForAddressIfLessThan(address, value, timeout_ns, false);
case ArbitrationType::DecrementAndWaitIfLessThan:
return WaitForAddressIfLessThan(address, value, timeout_ns, true);
case ArbitrationType::WaitIfEqual:
return WaitForAddressIfEqual(address, value, timeout_ns);
default:
return ERR_INVALID_ENUM_VALUE;
}
}
ResultCode AddressArbiter::WaitForAddressIfLessThan(VAddr address, s32 value, s64 timeout,
bool should_decrement) {
// Ensure that we can read the address.
@@ -113,7 +141,7 @@ ResultCode AddressArbiter::WaitForAddressIfLessThan(VAddr address, s32 value, s6
return RESULT_TIMEOUT;
}
return WaitForAddress(address, timeout);
return WaitForAddressImpl(address, timeout);
}
ResultCode AddressArbiter::WaitForAddressIfEqual(VAddr address, s32 value, s64 timeout) {
@@ -130,10 +158,10 @@ ResultCode AddressArbiter::WaitForAddressIfEqual(VAddr address, s32 value, s64 t
return RESULT_TIMEOUT;
}
return WaitForAddress(address, timeout);
return WaitForAddressImpl(address, timeout);
}
ResultCode AddressArbiter::WaitForAddress(VAddr address, s64 timeout) {
ResultCode AddressArbiter::WaitForAddressImpl(VAddr address, s64 timeout) {
SharedPtr<Thread> current_thread = system.CurrentScheduler().GetCurrentThread();
current_thread->SetArbiterWaitAddress(address);
current_thread->SetStatus(ThreadStatus::WaitArb);

View File

@@ -4,8 +4,10 @@
#pragma once
#include <vector>
#include "common/common_types.h"
#include "core/hle/kernel/address_arbiter.h"
#include "core/hle/kernel/object.h"
union ResultCode;
@@ -40,8 +42,15 @@ public:
AddressArbiter(AddressArbiter&&) = default;
AddressArbiter& operator=(AddressArbiter&&) = delete;
/// Signals an address being waited on with a particular signaling type.
ResultCode SignalToAddress(VAddr address, SignalType type, s32 value, s32 num_to_wake);
/// Waits on an address with a particular arbitration type.
ResultCode WaitForAddress(VAddr address, ArbitrationType type, s32 value, s64 timeout_ns);
private:
/// Signals an address being waited on.
ResultCode SignalToAddress(VAddr address, s32 num_to_wake);
ResultCode SignalToAddressOnly(VAddr address, s32 num_to_wake);
/// Signals an address being waited on and increments its value if equal to the value argument.
ResultCode IncrementAndSignalToAddressIfEqual(VAddr address, s32 value, s32 num_to_wake);
@@ -59,9 +68,8 @@ public:
/// Waits on an address if the value passed is equal to the argument value.
ResultCode WaitForAddressIfEqual(VAddr address, s32 value, s64 timeout);
private:
// Waits on the given address with a timeout in nanoseconds
ResultCode WaitForAddress(VAddr address, s64 timeout);
ResultCode WaitForAddressImpl(VAddr address, s64 timeout);
// Gets the threads waiting on an address.
std::vector<SharedPtr<Thread>> GetThreadsWaitingOnAddress(VAddr address) const;

View File

@@ -33,10 +33,11 @@ ResultVal<SharedPtr<ClientSession>> ClientPort::Connect() {
// Create a new session pair, let the created sessions inherit the parent port's HLE handler.
auto sessions = ServerSession::CreateSessionPair(kernel, server_port->GetName(), this);
if (server_port->hle_handler)
server_port->hle_handler->ClientConnected(std::get<SharedPtr<ServerSession>>(sessions));
else
server_port->pending_sessions.push_back(std::get<SharedPtr<ServerSession>>(sessions));
if (server_port->HasHLEHandler()) {
server_port->GetHLEHandler()->ClientConnected(std::get<SharedPtr<ServerSession>>(sessions));
} else {
server_port->AppendPendingSession(std::get<SharedPtr<ServerSession>>(sessions));
}
// Wake the threads waiting on the ServerPort
server_port->WakeupAllWaitingThreads();

View File

@@ -17,21 +17,11 @@ ClientSession::~ClientSession() {
// This destructor will be called automatically when the last ClientSession handle is closed by
// the emulated application.
// Local references to ServerSession and SessionRequestHandler are necessary to guarantee they
// A local reference to the ServerSession is necessary to guarantee it
// will be kept alive until after ClientDisconnected() returns.
SharedPtr<ServerSession> server = parent->server;
if (server) {
std::shared_ptr<SessionRequestHandler> hle_handler = server->hle_handler;
if (hle_handler)
hle_handler->ClientDisconnected(server);
// TODO(Subv): Force a wake up of all the ServerSession's waiting threads and set
// their WaitSynchronization result to 0xC920181A.
// Clean up the list of client threads with pending requests, they are unneeded now that the
// client endpoint is closed.
server->pending_requesting_threads.clear();
server->currently_handling = nullptr;
server->ClientDisconnected();
}
parent->client = nullptr;

View File

@@ -36,14 +36,15 @@ public:
ResultCode SendSyncRequest(SharedPtr<Thread> thread);
std::string name; ///< Name of client port (optional)
private:
explicit ClientSession(KernelCore& kernel);
~ClientSession() override;
/// The parent session, which links to the server endpoint.
std::shared_ptr<Session> parent;
private:
explicit ClientSession(KernelCore& kernel);
~ClientSession() override;
/// Name of the client session (optional)
std::string name;
};
} // namespace Kernel

View File

@@ -86,7 +86,7 @@ HLERequestContext::~HLERequestContext() = default;
void HLERequestContext::ParseCommandBuffer(const HandleTable& handle_table, u32_le* src_cmdbuf,
bool incoming) {
IPC::RequestParser rp(src_cmdbuf);
command_header = std::make_shared<IPC::CommandHeader>(rp.PopRaw<IPC::CommandHeader>());
command_header = rp.PopRaw<IPC::CommandHeader>();
if (command_header->type == IPC::CommandType::Close) {
// Close does not populate the rest of the IPC header
@@ -95,8 +95,7 @@ void HLERequestContext::ParseCommandBuffer(const HandleTable& handle_table, u32_
// If handle descriptor is present, add size of it
if (command_header->enable_handle_descriptor) {
handle_descriptor_header =
std::make_shared<IPC::HandleDescriptorHeader>(rp.PopRaw<IPC::HandleDescriptorHeader>());
handle_descriptor_header = rp.PopRaw<IPC::HandleDescriptorHeader>();
if (handle_descriptor_header->send_current_pid) {
rp.Skip(2, false);
}
@@ -140,16 +139,15 @@ void HLERequestContext::ParseCommandBuffer(const HandleTable& handle_table, u32_
// If this is an incoming message, only CommandType "Request" has a domain header
// All outgoing domain messages have the domain header, if only incoming has it
if (incoming || domain_message_header) {
domain_message_header =
std::make_shared<IPC::DomainMessageHeader>(rp.PopRaw<IPC::DomainMessageHeader>());
domain_message_header = rp.PopRaw<IPC::DomainMessageHeader>();
} else {
if (Session()->IsDomain())
if (Session()->IsDomain()) {
LOG_WARNING(IPC, "Domain request has no DomainMessageHeader!");
}
}
}
data_payload_header =
std::make_shared<IPC::DataPayloadHeader>(rp.PopRaw<IPC::DataPayloadHeader>());
data_payload_header = rp.PopRaw<IPC::DataPayloadHeader>();
data_payload_offset = rp.GetCurrentOffset();
@@ -264,11 +262,11 @@ ResultCode HLERequestContext::WriteToOutgoingCommandBuffer(Thread& thread) {
// Write the domain objects to the command buffer, these go after the raw untranslated data.
// TODO(Subv): This completely ignores C buffers.
std::size_t domain_offset = size - domain_message_header->num_objects;
auto& request_handlers = server_session->domain_request_handlers;
for (auto& object : domain_objects) {
request_handlers.emplace_back(object);
dst_cmdbuf[domain_offset++] = static_cast<u32_le>(request_handlers.size());
for (const auto& object : domain_objects) {
server_session->AppendDomainRequestHandler(object);
dst_cmdbuf[domain_offset++] =
static_cast<u32_le>(server_session->NumDomainRequestHandlers());
}
}

View File

@@ -6,6 +6,7 @@
#include <array>
#include <memory>
#include <optional>
#include <string>
#include <type_traits>
#include <vector>
@@ -15,6 +16,8 @@
#include "core/hle/ipc.h"
#include "core/hle/kernel/object.h"
union ResultCode;
namespace Service {
class ServiceFrameworkBase;
}
@@ -166,12 +169,12 @@ public:
return buffer_c_desciptors;
}
const IPC::DomainMessageHeader* GetDomainMessageHeader() const {
return domain_message_header.get();
const IPC::DomainMessageHeader& GetDomainMessageHeader() const {
return domain_message_header.value();
}
bool HasDomainMessageHeader() const {
return domain_message_header != nullptr;
return domain_message_header.has_value();
}
/// Helper function to read a buffer using the appropriate buffer descriptor
@@ -208,14 +211,12 @@ public:
template <typename T>
SharedPtr<T> GetCopyObject(std::size_t index) {
ASSERT(index < copy_objects.size());
return DynamicObjectCast<T>(copy_objects[index]);
return DynamicObjectCast<T>(copy_objects.at(index));
}
template <typename T>
SharedPtr<T> GetMoveObject(std::size_t index) {
ASSERT(index < move_objects.size());
return DynamicObjectCast<T>(move_objects[index]);
return DynamicObjectCast<T>(move_objects.at(index));
}
void AddMoveObject(SharedPtr<Object> object) {
@@ -232,7 +233,7 @@ public:
template <typename T>
std::shared_ptr<T> GetDomainRequestHandler(std::size_t index) const {
return std::static_pointer_cast<T>(domain_request_handlers[index]);
return std::static_pointer_cast<T>(domain_request_handlers.at(index));
}
void SetDomainRequestHandlers(
@@ -272,10 +273,10 @@ private:
boost::container::small_vector<SharedPtr<Object>, 8> copy_objects;
boost::container::small_vector<std::shared_ptr<SessionRequestHandler>, 8> domain_objects;
std::shared_ptr<IPC::CommandHeader> command_header;
std::shared_ptr<IPC::HandleDescriptorHeader> handle_descriptor_header;
std::shared_ptr<IPC::DataPayloadHeader> data_payload_header;
std::shared_ptr<IPC::DomainMessageHeader> domain_message_header;
std::optional<IPC::CommandHeader> command_header;
std::optional<IPC::HandleDescriptorHeader> handle_descriptor_header;
std::optional<IPC::DataPayloadHeader> data_payload_header;
std::optional<IPC::DomainMessageHeader> domain_message_header;
std::vector<IPC::BufferDescriptorX> buffer_x_desciptors;
std::vector<IPC::BufferDescriptorABW> buffer_a_desciptors;
std::vector<IPC::BufferDescriptorABW> buffer_b_desciptors;

View File

@@ -87,7 +87,7 @@ static void ThreadWakeupCallback(u64 thread_handle, [[maybe_unused]] int cycles_
}
struct KernelCore::Impl {
explicit Impl(Core::System& system) : address_arbiter{system}, system{system} {}
explicit Impl(Core::System& system) : system{system} {}
void Initialize(KernelCore& kernel) {
Shutdown();
@@ -138,8 +138,6 @@ struct KernelCore::Impl {
std::vector<SharedPtr<Process>> process_list;
Process* current_process = nullptr;
Kernel::AddressArbiter address_arbiter;
SharedPtr<ResourceLimit> system_resource_limit;
Core::Timing::EventType* thread_wakeup_event_type = nullptr;
@@ -192,14 +190,6 @@ const Process* KernelCore::CurrentProcess() const {
return impl->current_process;
}
AddressArbiter& KernelCore::AddressArbiter() {
return impl->address_arbiter;
}
const AddressArbiter& KernelCore::AddressArbiter() const {
return impl->address_arbiter;
}
void KernelCore::AddNamedPort(std::string name, SharedPtr<ClientPort> port) {
impl->named_ports.emplace(std::move(name), std::move(port));
}

View File

@@ -75,12 +75,6 @@ public:
/// Retrieves a const pointer to the current process.
const Process* CurrentProcess() const;
/// Provides a reference to the kernel's address arbiter.
Kernel::AddressArbiter& AddressArbiter();
/// Provides a const reference to the kernel's address arbiter.
const Kernel::AddressArbiter& AddressArbiter() const;
/// Adds a port to the named port table
void AddNamedPort(std::string name, SharedPtr<ClientPort> port);

View File

@@ -53,9 +53,10 @@ void SetupMainThread(Process& owner_process, KernelCore& kernel, VAddr entry_poi
CodeSet::CodeSet() = default;
CodeSet::~CodeSet() = default;
SharedPtr<Process> Process::Create(KernelCore& kernel, std::string&& name) {
SharedPtr<Process> process(new Process(kernel));
SharedPtr<Process> Process::Create(Core::System& system, std::string&& name) {
auto& kernel = system.Kernel();
SharedPtr<Process> process(new Process(system));
process->name = std::move(name);
process->resource_limit = kernel.GetSystemResourceLimit();
process->status = ProcessStatus::Created;
@@ -132,7 +133,7 @@ void Process::PrepareForTermination() {
if (thread->GetOwnerProcess() != this)
continue;
if (thread == GetCurrentThread())
if (thread == system.CurrentScheduler().GetCurrentThread())
continue;
// TODO(Subv): When are the other running/ready threads terminated?
@@ -144,7 +145,6 @@ void Process::PrepareForTermination() {
}
};
const auto& system = Core::System::GetInstance();
stop_threads(system.Scheduler(0).GetThreadList());
stop_threads(system.Scheduler(1).GetThreadList());
stop_threads(system.Scheduler(2).GetThreadList());
@@ -227,14 +227,12 @@ void Process::LoadModule(CodeSet module_, VAddr base_addr) {
MapSegment(module_.DataSegment(), VMAPermission::ReadWrite, MemoryState::CodeMutable);
// Clear instruction cache in CPU JIT
Core::System::GetInstance().ArmInterface(0).ClearInstructionCache();
Core::System::GetInstance().ArmInterface(1).ClearInstructionCache();
Core::System::GetInstance().ArmInterface(2).ClearInstructionCache();
Core::System::GetInstance().ArmInterface(3).ClearInstructionCache();
system.InvalidateCpuInstructionCaches();
}
Kernel::Process::Process(KernelCore& kernel) : WaitObject{kernel} {}
Kernel::Process::~Process() {}
Process::Process(Core::System& system)
: WaitObject{system.Kernel()}, address_arbiter{system}, system{system} {}
Process::~Process() = default;
void Process::Acquire(Thread* thread) {
ASSERT_MSG(!ShouldWait(thread), "Object unavailable!");

View File

@@ -12,12 +12,17 @@
#include <vector>
#include <boost/container/static_vector.hpp>
#include "common/common_types.h"
#include "core/hle/kernel/address_arbiter.h"
#include "core/hle/kernel/handle_table.h"
#include "core/hle/kernel/process_capability.h"
#include "core/hle/kernel/vm_manager.h"
#include "core/hle/kernel/wait_object.h"
#include "core/hle/result.h"
namespace Core {
class System;
}
namespace FileSys {
class ProgramMetadata;
}
@@ -116,7 +121,7 @@ public:
static constexpr std::size_t RANDOM_ENTROPY_SIZE = 4;
static SharedPtr<Process> Create(KernelCore& kernel, std::string&& name);
static SharedPtr<Process> Create(Core::System& system, std::string&& name);
std::string GetTypeName() const override {
return "Process";
@@ -150,6 +155,16 @@ public:
return handle_table;
}
/// Gets a reference to the process' address arbiter.
AddressArbiter& GetAddressArbiter() {
return address_arbiter;
}
/// Gets a const reference to the process' address arbiter.
const AddressArbiter& GetAddressArbiter() const {
return address_arbiter;
}
/// Gets the current status of the process
ProcessStatus GetStatus() const {
return status;
@@ -251,7 +266,7 @@ public:
void FreeTLSSlot(VAddr tls_address);
private:
explicit Process(KernelCore& kernel);
explicit Process(Core::System& system);
~Process() override;
/// Checks if the specified thread should wait until this process is available.
@@ -309,9 +324,16 @@ private:
/// Per-process handle table for storing created object handles in.
HandleTable handle_table;
/// Per-process address arbiter.
AddressArbiter address_arbiter;
/// Random values for svcGetInfo RandomEntropy
std::array<u64, RANDOM_ENTROPY_SIZE> random_entropy;
/// System context
Core::System& system;
/// Name of this process
std::string name;
};

View File

@@ -19,7 +19,8 @@ namespace Kernel {
std::mutex Scheduler::scheduler_mutex;
Scheduler::Scheduler(Core::ARM_Interface& cpu_core) : cpu_core(cpu_core) {}
Scheduler::Scheduler(Core::System& system, Core::ARM_Interface& cpu_core)
: cpu_core{cpu_core}, system{system} {}
Scheduler::~Scheduler() {
for (auto& thread : thread_list) {
@@ -61,7 +62,7 @@ Thread* Scheduler::PopNextReadyThread() {
void Scheduler::SwitchContext(Thread* new_thread) {
Thread* const previous_thread = GetCurrentThread();
Process* const previous_process = Core::CurrentProcess();
Process* const previous_process = system.Kernel().CurrentProcess();
UpdateLastContextSwitchTime(previous_thread, previous_process);
@@ -94,8 +95,8 @@ void Scheduler::SwitchContext(Thread* new_thread) {
auto* const thread_owner_process = current_thread->GetOwnerProcess();
if (previous_process != thread_owner_process) {
Core::System::GetInstance().Kernel().MakeCurrentProcess(thread_owner_process);
SetCurrentPageTable(&Core::CurrentProcess()->VMManager().page_table);
system.Kernel().MakeCurrentProcess(thread_owner_process);
SetCurrentPageTable(&thread_owner_process->VMManager().page_table);
}
cpu_core.LoadContext(new_thread->GetContext());
@@ -111,7 +112,7 @@ void Scheduler::SwitchContext(Thread* new_thread) {
void Scheduler::UpdateLastContextSwitchTime(Thread* thread, Process* process) {
const u64 prev_switch_ticks = last_context_switch_time;
const u64 most_recent_switch_ticks = Core::System::GetInstance().CoreTiming().GetTicks();
const u64 most_recent_switch_ticks = system.CoreTiming().GetTicks();
const u64 update_ticks = most_recent_switch_ticks - prev_switch_ticks;
if (thread != nullptr) {
@@ -223,8 +224,7 @@ void Scheduler::YieldWithLoadBalancing(Thread* thread) {
// Take the first non-nullptr one
for (unsigned cur_core = 0; cur_core < Core::NUM_CPU_CORES; ++cur_core) {
const auto res =
Core::System::GetInstance().CpuCore(cur_core).Scheduler().GetNextSuggestedThread(
core, priority);
system.CpuCore(cur_core).Scheduler().GetNextSuggestedThread(core, priority);
// If scheduler provides a suggested thread
if (res != nullptr) {

View File

@@ -13,7 +13,8 @@
namespace Core {
class ARM_Interface;
}
class System;
} // namespace Core
namespace Kernel {
@@ -21,7 +22,7 @@ class Process;
class Scheduler final {
public:
explicit Scheduler(Core::ARM_Interface& cpu_core);
explicit Scheduler(Core::System& system, Core::ARM_Interface& cpu_core);
~Scheduler();
/// Returns whether there are any threads that are ready to run.
@@ -162,6 +163,7 @@ private:
Core::ARM_Interface& cpu_core;
u64 last_context_switch_time = 0;
Core::System& system;
static std::mutex scheduler_mutex;
};

View File

@@ -26,6 +26,10 @@ ResultVal<SharedPtr<ServerSession>> ServerPort::Accept() {
return MakeResult(std::move(session));
}
void ServerPort::AppendPendingSession(SharedPtr<ServerSession> pending_session) {
pending_sessions.push_back(std::move(pending_session));
}
bool ServerPort::ShouldWait(Thread* thread) const {
// If there are no pending sessions, we wait until a new one is added.
return pending_sessions.empty();

View File

@@ -22,6 +22,8 @@ class SessionRequestHandler;
class ServerPort final : public WaitObject {
public:
using HLEHandler = std::shared_ptr<SessionRequestHandler>;
/**
* Creates a pair of ServerPort and an associated ClientPort.
*
@@ -51,22 +53,27 @@ public:
*/
ResultVal<SharedPtr<ServerSession>> Accept();
/// Whether or not this server port has an HLE handler available.
bool HasHLEHandler() const {
return hle_handler != nullptr;
}
/// Gets the HLE handler for this port.
HLEHandler GetHLEHandler() const {
return hle_handler;
}
/**
* Sets the HLE handler template for the port. ServerSessions crated by connecting to this port
* will inherit a reference to this handler.
*/
void SetHleHandler(std::shared_ptr<SessionRequestHandler> hle_handler_) {
void SetHleHandler(HLEHandler hle_handler_) {
hle_handler = std::move(hle_handler_);
}
std::string name; ///< Name of port (optional)
/// ServerSessions waiting to be accepted by the port
std::vector<SharedPtr<ServerSession>> pending_sessions;
/// This session's HLE request handler template (optional)
/// ServerSessions created from this port inherit a reference to this handler.
std::shared_ptr<SessionRequestHandler> hle_handler;
/// Appends a ServerSession to the collection of ServerSessions
/// waiting to be accepted by this port.
void AppendPendingSession(SharedPtr<ServerSession> pending_session);
bool ShouldWait(Thread* thread) const override;
void Acquire(Thread* thread) override;
@@ -74,6 +81,16 @@ public:
private:
explicit ServerPort(KernelCore& kernel);
~ServerPort() override;
/// ServerSessions waiting to be accepted by the port
std::vector<SharedPtr<ServerSession>> pending_sessions;
/// This session's HLE request handler template (optional)
/// ServerSessions created from this port inherit a reference to this handler.
HLEHandler hle_handler;
/// Name of the port (optional)
std::string name;
};
} // namespace Kernel

View File

@@ -63,42 +63,71 @@ void ServerSession::Acquire(Thread* thread) {
pending_requesting_threads.pop_back();
}
ResultCode ServerSession::HandleDomainSyncRequest(Kernel::HLERequestContext& context) {
auto* const domain_message_header = context.GetDomainMessageHeader();
if (domain_message_header) {
// Set domain handlers in HLE context, used for domain objects (IPC interfaces) as inputs
context.SetDomainRequestHandlers(domain_request_handlers);
// If there is a DomainMessageHeader, then this is CommandType "Request"
const u32 object_id{context.GetDomainMessageHeader()->object_id};
switch (domain_message_header->command) {
case IPC::DomainMessageHeader::CommandType::SendMessage:
if (object_id > domain_request_handlers.size()) {
LOG_CRITICAL(IPC,
"object_id {} is too big! This probably means a recent service call "
"to {} needed to return a new interface!",
object_id, name);
UNREACHABLE();
return RESULT_SUCCESS; // Ignore error if asserts are off
}
return domain_request_handlers[object_id - 1]->HandleSyncRequest(context);
case IPC::DomainMessageHeader::CommandType::CloseVirtualHandle: {
LOG_DEBUG(IPC, "CloseVirtualHandle, object_id=0x{:08X}", object_id);
domain_request_handlers[object_id - 1] = nullptr;
IPC::ResponseBuilder rb{context, 2};
rb.Push(RESULT_SUCCESS);
return RESULT_SUCCESS;
}
}
LOG_CRITICAL(IPC, "Unknown domain command={}",
static_cast<int>(domain_message_header->command.Value()));
ASSERT(false);
void ServerSession::ClientDisconnected() {
// We keep a shared pointer to the hle handler to keep it alive throughout
// the call to ClientDisconnected, as ClientDisconnected invalidates the
// hle_handler member itself during the course of the function executing.
std::shared_ptr<SessionRequestHandler> handler = hle_handler;
if (handler) {
// Note that after this returns, this server session's hle_handler is
// invalidated (set to null).
handler->ClientDisconnected(this);
}
// TODO(Subv): Force a wake up of all the ServerSession's waiting threads and set
// their WaitSynchronization result to 0xC920181A.
// Clean up the list of client threads with pending requests, they are unneeded now that the
// client endpoint is closed.
pending_requesting_threads.clear();
currently_handling = nullptr;
}
void ServerSession::AppendDomainRequestHandler(std::shared_ptr<SessionRequestHandler> handler) {
domain_request_handlers.push_back(std::move(handler));
}
std::size_t ServerSession::NumDomainRequestHandlers() const {
return domain_request_handlers.size();
}
ResultCode ServerSession::HandleDomainSyncRequest(Kernel::HLERequestContext& context) {
if (!context.HasDomainMessageHeader()) {
return RESULT_SUCCESS;
}
// Set domain handlers in HLE context, used for domain objects (IPC interfaces) as inputs
context.SetDomainRequestHandlers(domain_request_handlers);
// If there is a DomainMessageHeader, then this is CommandType "Request"
const auto& domain_message_header = context.GetDomainMessageHeader();
const u32 object_id{domain_message_header.object_id};
switch (domain_message_header.command) {
case IPC::DomainMessageHeader::CommandType::SendMessage:
if (object_id > domain_request_handlers.size()) {
LOG_CRITICAL(IPC,
"object_id {} is too big! This probably means a recent service call "
"to {} needed to return a new interface!",
object_id, name);
UNREACHABLE();
return RESULT_SUCCESS; // Ignore error if asserts are off
}
return domain_request_handlers[object_id - 1]->HandleSyncRequest(context);
case IPC::DomainMessageHeader::CommandType::CloseVirtualHandle: {
LOG_DEBUG(IPC, "CloseVirtualHandle, object_id=0x{:08X}", object_id);
domain_request_handlers[object_id - 1] = nullptr;
IPC::ResponseBuilder rb{context, 2};
rb.Push(RESULT_SUCCESS);
return RESULT_SUCCESS;
}
}
LOG_CRITICAL(IPC, "Unknown domain command={}",
static_cast<int>(domain_message_header.command.Value()));
ASSERT(false);
return RESULT_SUCCESS;
}

View File

@@ -46,6 +46,14 @@ public:
return HANDLE_TYPE;
}
Session* GetParent() {
return parent.get();
}
const Session* GetParent() const {
return parent.get();
}
using SessionPair = std::tuple<SharedPtr<ServerSession>, SharedPtr<ClientSession>>;
/**
@@ -78,23 +86,16 @@ public:
void Acquire(Thread* thread) override;
std::string name; ///< The name of this session (optional)
std::shared_ptr<Session> parent; ///< The parent session, which links to the client endpoint.
std::shared_ptr<SessionRequestHandler>
hle_handler; ///< This session's HLE request handler (applicable when not a domain)
/// Called when a client disconnection occurs.
void ClientDisconnected();
/// This is the list of domain request handlers (after conversion to a domain)
std::vector<std::shared_ptr<SessionRequestHandler>> domain_request_handlers;
/// Adds a new domain request handler to the collection of request handlers within
/// this ServerSession instance.
void AppendDomainRequestHandler(std::shared_ptr<SessionRequestHandler> handler);
/// List of threads that are pending a response after a sync request. This list is processed in
/// a LIFO manner, thus, the last request will be dispatched first.
/// TODO(Subv): Verify if this is indeed processed in LIFO using a hardware test.
std::vector<SharedPtr<Thread>> pending_requesting_threads;
/// Thread whose request is currently being handled. A request is considered "handled" when a
/// response is sent via svcReplyAndReceive.
/// TODO(Subv): Find a better name for this.
SharedPtr<Thread> currently_handling;
/// Retrieves the total number of domain request handlers that have been
/// appended to this ServerSession instance.
std::size_t NumDomainRequestHandlers() const;
/// Returns true if the session has been converted to a domain, otherwise False
bool IsDomain() const {
@@ -129,8 +130,30 @@ private:
/// object handle.
ResultCode HandleDomainSyncRequest(Kernel::HLERequestContext& context);
/// The parent session, which links to the client endpoint.
std::shared_ptr<Session> parent;
/// This session's HLE request handler (applicable when not a domain)
std::shared_ptr<SessionRequestHandler> hle_handler;
/// This is the list of domain request handlers (after conversion to a domain)
std::vector<std::shared_ptr<SessionRequestHandler>> domain_request_handlers;
/// List of threads that are pending a response after a sync request. This list is processed in
/// a LIFO manner, thus, the last request will be dispatched first.
/// TODO(Subv): Verify if this is indeed processed in LIFO using a hardware test.
std::vector<SharedPtr<Thread>> pending_requesting_threads;
/// Thread whose request is currently being handled. A request is considered "handled" when a
/// response is sent via svcReplyAndReceive.
/// TODO(Subv): Find a better name for this.
SharedPtr<Thread> currently_handling;
/// When set to True, converts the session to a domain at the end of the command
bool convert_to_domain{};
/// The name of this session (optional)
std::string name;
};
} // namespace Kernel

View File

@@ -6,7 +6,6 @@
#include "common/assert.h"
#include "common/logging/log.h"
#include "core/core.h"
#include "core/hle/kernel/errors.h"
#include "core/hle/kernel/kernel.h"
#include "core/hle/kernel/shared_memory.h"
@@ -34,8 +33,8 @@ SharedPtr<SharedMemory> SharedMemory::Create(KernelCore& kernel, Process* owner_
shared_memory->backing_block_offset = 0;
// Refresh the address mappings for the current process.
if (Core::CurrentProcess() != nullptr) {
Core::CurrentProcess()->VMManager().RefreshMemoryBlockMappings(
if (kernel.CurrentProcess() != nullptr) {
kernel.CurrentProcess()->VMManager().RefreshMemoryBlockMappings(
shared_memory->backing_block.get());
}
} else {

View File

@@ -20,6 +20,7 @@
#include "core/hle/kernel/address_arbiter.h"
#include "core/hle/kernel/client_port.h"
#include "core/hle/kernel/client_session.h"
#include "core/hle/kernel/errors.h"
#include "core/hle/kernel/handle_table.h"
#include "core/hle/kernel/kernel.h"
#include "core/hle/kernel/mutex.h"
@@ -1478,21 +1479,10 @@ static ResultCode WaitForAddress(VAddr address, u32 type, s32 value, s64 timeout
return ERR_INVALID_ADDRESS;
}
auto& address_arbiter = Core::System::GetInstance().Kernel().AddressArbiter();
switch (static_cast<AddressArbiter::ArbitrationType>(type)) {
case AddressArbiter::ArbitrationType::WaitIfLessThan:
return address_arbiter.WaitForAddressIfLessThan(address, value, timeout, false);
case AddressArbiter::ArbitrationType::DecrementAndWaitIfLessThan:
return address_arbiter.WaitForAddressIfLessThan(address, value, timeout, true);
case AddressArbiter::ArbitrationType::WaitIfEqual:
return address_arbiter.WaitForAddressIfEqual(address, value, timeout);
default:
LOG_ERROR(Kernel_SVC,
"Invalid arbitration type, expected WaitIfLessThan, DecrementAndWaitIfLessThan "
"or WaitIfEqual but got {}",
type);
return ERR_INVALID_ENUM_VALUE;
}
const auto arbitration_type = static_cast<AddressArbiter::ArbitrationType>(type);
auto& address_arbiter =
Core::System::GetInstance().Kernel().CurrentProcess()->GetAddressArbiter();
return address_arbiter.WaitForAddress(address, arbitration_type, value, timeout);
}
// Signals to an address (via Address Arbiter)
@@ -1510,22 +1500,10 @@ static ResultCode SignalToAddress(VAddr address, u32 type, s32 value, s32 num_to
return ERR_INVALID_ADDRESS;
}
auto& address_arbiter = Core::System::GetInstance().Kernel().AddressArbiter();
switch (static_cast<AddressArbiter::SignalType>(type)) {
case AddressArbiter::SignalType::Signal:
return address_arbiter.SignalToAddress(address, num_to_wake);
case AddressArbiter::SignalType::IncrementAndSignalIfEqual:
return address_arbiter.IncrementAndSignalToAddressIfEqual(address, value, num_to_wake);
case AddressArbiter::SignalType::ModifyByWaitingCountAndSignalIfEqual:
return address_arbiter.ModifyByWaitingCountAndSignalToAddressIfEqual(address, value,
num_to_wake);
default:
LOG_ERROR(Kernel_SVC,
"Invalid signal type, expected Signal, IncrementAndSignalIfEqual "
"or ModifyByWaitingCountAndSignalIfEqual but got {}",
type);
return ERR_INVALID_ENUM_VALUE;
}
const auto signal_type = static_cast<AddressArbiter::SignalType>(type);
auto& address_arbiter =
Core::System::GetInstance().Kernel().CurrentProcess()->GetAddressArbiter();
return address_arbiter.SignalToAddress(address, signal_type, value, num_to_wake);
}
/// This returns the total CPU ticks elapsed since the CPU was powered-on

View File

@@ -8,19 +8,10 @@
#include <utility>
#include "common/assert.h"
#include "common/bit_field.h"
#include "common/common_funcs.h"
#include "common/common_types.h"
// All the constants in this file come from http://switchbrew.org/index.php?title=Error_codes
/**
* Detailed description of the error. Code 0 always means success.
*/
enum class ErrorDescription : u32 {
Success = 0,
RemoteProcessDead = 301,
};
/**
* Identifies the module which caused the error. Error codes can be propagated through a call
* chain, meaning that this doesn't always correspond to the module where the API call made is
@@ -121,7 +112,7 @@ enum class ErrorModule : u32 {
ShopN = 811,
};
/// Encapsulates a CTR-OS error code, allowing it to be separated into its constituent fields.
/// Encapsulates a Horizon OS error code, allowing it to be separated into its constituent fields.
union ResultCode {
u32 raw;
@@ -134,17 +125,9 @@ union ResultCode {
constexpr explicit ResultCode(u32 raw) : raw(raw) {}
constexpr ResultCode(ErrorModule module, ErrorDescription description)
: ResultCode(module, static_cast<u32>(description)) {}
constexpr ResultCode(ErrorModule module_, u32 description_)
: raw(module.FormatValue(module_) | description.FormatValue(description_)) {}
constexpr ResultCode& operator=(const ResultCode& o) {
raw = o.raw;
return *this;
}
constexpr bool IsSuccess() const {
return raw == 0;
}

View File

@@ -7,6 +7,7 @@
#include "common/string_util.h"
#include "core/core.h"
#include "core/frontend/applets/software_keyboard.h"
#include "core/hle/result.h"
#include "core/hle/service/am/am.h"
#include "core/hle/service/am/applets/software_keyboard.h"

View File

@@ -9,10 +9,13 @@
#include <vector>
#include "common/common_funcs.h"
#include "common/common_types.h"
#include "common/swap.h"
#include "core/hle/service/am/am.h"
#include "core/hle/service/am/applets/applets.h"
union ResultCode;
namespace Service::AM::Applets {
enum class KeysetDisable : u32 {

View File

@@ -107,7 +107,9 @@ private:
void StopAudioOut(Kernel::HLERequestContext& ctx) {
LOG_DEBUG(Service_Audio, "called");
audio_core.StopStream(stream);
if (stream->IsPlaying()) {
audio_core.StopStream(stream);
}
IPC::ResponseBuilder rb{ctx, 2};
rb.Push(RESULT_SUCCESS);

View File

@@ -9,43 +9,32 @@
#include <opus.h>
#include "common/common_funcs.h"
#include "common/assert.h"
#include "common/logging/log.h"
#include "core/hle/ipc_helpers.h"
#include "core/hle/kernel/hle_ipc.h"
#include "core/hle/service/audio/hwopus.h"
namespace Service::Audio {
namespace {
struct OpusDeleter {
void operator()(void* ptr) const {
operator delete(ptr);
}
};
class IHardwareOpusDecoderManager final : public ServiceFramework<IHardwareOpusDecoderManager> {
using OpusDecoderPtr = std::unique_ptr<OpusDecoder, OpusDeleter>;
struct OpusPacketHeader {
// Packet size in bytes.
u32_be size;
// Indicates the final range of the codec's entropy coder.
u32_be final_range;
};
static_assert(sizeof(OpusPacketHeader) == 0x8, "OpusHeader is an invalid size");
class OpusDecoderStateBase {
public:
IHardwareOpusDecoderManager(std::unique_ptr<OpusDecoder, OpusDeleter> decoder, u32 sample_rate,
u32 channel_count)
: ServiceFramework("IHardwareOpusDecoderManager"), decoder(std::move(decoder)),
sample_rate(sample_rate), channel_count(channel_count) {
// clang-format off
static const FunctionInfo functions[] = {
{0, &IHardwareOpusDecoderManager::DecodeInterleavedOld, "DecodeInterleavedOld"},
{1, nullptr, "SetContext"},
{2, nullptr, "DecodeInterleavedForMultiStreamOld"},
{3, nullptr, "SetContextForMultiStream"},
{4, &IHardwareOpusDecoderManager::DecodeInterleavedWithPerfOld, "DecodeInterleavedWithPerfOld"},
{5, nullptr, "DecodeInterleavedForMultiStreamWithPerfOld"},
{6, &IHardwareOpusDecoderManager::DecodeInterleaved, "DecodeInterleaved"},
{7, nullptr, "DecodeInterleavedForMultiStream"},
};
// clang-format on
RegisterHandlers(functions);
}
private:
/// Describes extra behavior that may be asked of the decoding context.
enum class ExtraBehavior {
/// No extra behavior.
@@ -55,30 +44,36 @@ private:
ResetContext,
};
void DecodeInterleavedOld(Kernel::HLERequestContext& ctx) {
LOG_DEBUG(Audio, "called");
enum class PerfTime {
Disabled,
Enabled,
};
DecodeInterleavedHelper(ctx, nullptr, ExtraBehavior::None);
}
void DecodeInterleavedWithPerfOld(Kernel::HLERequestContext& ctx) {
LOG_DEBUG(Audio, "called");
u64 performance = 0;
DecodeInterleavedHelper(ctx, &performance, ExtraBehavior::None);
}
void DecodeInterleaved(Kernel::HLERequestContext& ctx) {
LOG_DEBUG(Audio, "called");
IPC::RequestParser rp{ctx};
const auto extra_behavior =
rp.Pop<bool>() ? ExtraBehavior::ResetContext : ExtraBehavior::None;
u64 performance = 0;
DecodeInterleavedHelper(ctx, &performance, extra_behavior);
virtual ~OpusDecoderStateBase() = default;
// Decodes interleaved Opus packets. Optionally allows reporting time taken to
// perform the decoding, as well as any relevant extra behavior.
virtual void DecodeInterleaved(Kernel::HLERequestContext& ctx, PerfTime perf_time,
ExtraBehavior extra_behavior) = 0;
};
// Represents the decoder state for a non-multistream decoder.
class OpusDecoderState final : public OpusDecoderStateBase {
public:
explicit OpusDecoderState(OpusDecoderPtr decoder, u32 sample_rate, u32 channel_count)
: decoder{std::move(decoder)}, sample_rate{sample_rate}, channel_count{channel_count} {}
void DecodeInterleaved(Kernel::HLERequestContext& ctx, PerfTime perf_time,
ExtraBehavior extra_behavior) override {
if (perf_time == PerfTime::Disabled) {
DecodeInterleavedHelper(ctx, nullptr, extra_behavior);
} else {
u64 performance = 0;
DecodeInterleavedHelper(ctx, &performance, extra_behavior);
}
}
private:
void DecodeInterleavedHelper(Kernel::HLERequestContext& ctx, u64* performance,
ExtraBehavior extra_behavior) {
u32 consumed = 0;
@@ -89,8 +84,7 @@ private:
ResetDecoderContext();
}
if (!Decoder_DecodeInterleaved(consumed, sample_count, ctx.ReadBuffer(), samples,
performance)) {
if (!DecodeOpusData(consumed, sample_count, ctx.ReadBuffer(), samples, performance)) {
LOG_ERROR(Audio, "Failed to decode opus data");
IPC::ResponseBuilder rb{ctx, 2};
// TODO(ogniK): Use correct error code
@@ -109,27 +103,27 @@ private:
ctx.WriteBuffer(samples.data(), samples.size() * sizeof(s16));
}
bool Decoder_DecodeInterleaved(u32& consumed, u32& sample_count, const std::vector<u8>& input,
std::vector<opus_int16>& output, u64* out_performance_time) {
bool DecodeOpusData(u32& consumed, u32& sample_count, const std::vector<u8>& input,
std::vector<opus_int16>& output, u64* out_performance_time) const {
const auto start_time = std::chrono::high_resolution_clock::now();
const std::size_t raw_output_sz = output.size() * sizeof(opus_int16);
if (sizeof(OpusHeader) > input.size()) {
if (sizeof(OpusPacketHeader) > input.size()) {
LOG_ERROR(Audio, "Input is smaller than the header size, header_sz={}, input_sz={}",
sizeof(OpusHeader), input.size());
sizeof(OpusPacketHeader), input.size());
return false;
}
OpusHeader hdr{};
std::memcpy(&hdr, input.data(), sizeof(OpusHeader));
if (sizeof(OpusHeader) + static_cast<u32>(hdr.sz) > input.size()) {
OpusPacketHeader hdr{};
std::memcpy(&hdr, input.data(), sizeof(OpusPacketHeader));
if (sizeof(OpusPacketHeader) + static_cast<u32>(hdr.size) > input.size()) {
LOG_ERROR(Audio, "Input does not fit in the opus header size. data_sz={}, input_sz={}",
sizeof(OpusHeader) + static_cast<u32>(hdr.sz), input.size());
sizeof(OpusPacketHeader) + static_cast<u32>(hdr.size), input.size());
return false;
}
const auto frame = input.data() + sizeof(OpusHeader);
const auto frame = input.data() + sizeof(OpusPacketHeader);
const auto decoded_sample_count = opus_packet_get_nb_samples(
frame, static_cast<opus_int32>(input.size() - sizeof(OpusHeader)),
frame, static_cast<opus_int32>(input.size() - sizeof(OpusPacketHeader)),
static_cast<opus_int32>(sample_rate));
if (decoded_sample_count * channel_count * sizeof(u16) > raw_output_sz) {
LOG_ERROR(
@@ -141,18 +135,18 @@ private:
const int frame_size = (static_cast<int>(raw_output_sz / sizeof(s16) / channel_count));
const auto out_sample_count =
opus_decode(decoder.get(), frame, hdr.sz, output.data(), frame_size, 0);
opus_decode(decoder.get(), frame, hdr.size, output.data(), frame_size, 0);
if (out_sample_count < 0) {
LOG_ERROR(Audio,
"Incorrect sample count received from opus_decode, "
"output_sample_count={}, frame_size={}, data_sz_from_hdr={}",
out_sample_count, frame_size, static_cast<u32>(hdr.sz));
out_sample_count, frame_size, static_cast<u32>(hdr.size));
return false;
}
const auto end_time = std::chrono::high_resolution_clock::now() - start_time;
sample_count = out_sample_count;
consumed = static_cast<u32>(sizeof(OpusHeader) + hdr.sz);
consumed = static_cast<u32>(sizeof(OpusPacketHeader) + hdr.size);
if (out_performance_time != nullptr) {
*out_performance_time =
std::chrono::duration_cast<std::chrono::milliseconds>(end_time).count();
@@ -167,21 +161,66 @@ private:
opus_decoder_ctl(decoder.get(), OPUS_RESET_STATE);
}
struct OpusHeader {
u32_be sz; // Needs to be BE for some odd reason
INSERT_PADDING_WORDS(1);
};
static_assert(sizeof(OpusHeader) == 0x8, "OpusHeader is an invalid size");
std::unique_ptr<OpusDecoder, OpusDeleter> decoder;
OpusDecoderPtr decoder;
u32 sample_rate;
u32 channel_count;
};
static std::size_t WorkerBufferSize(u32 channel_count) {
class IHardwareOpusDecoderManager final : public ServiceFramework<IHardwareOpusDecoderManager> {
public:
explicit IHardwareOpusDecoderManager(std::unique_ptr<OpusDecoderStateBase> decoder_state)
: ServiceFramework("IHardwareOpusDecoderManager"), decoder_state{std::move(decoder_state)} {
// clang-format off
static const FunctionInfo functions[] = {
{0, &IHardwareOpusDecoderManager::DecodeInterleavedOld, "DecodeInterleavedOld"},
{1, nullptr, "SetContext"},
{2, nullptr, "DecodeInterleavedForMultiStreamOld"},
{3, nullptr, "SetContextForMultiStream"},
{4, &IHardwareOpusDecoderManager::DecodeInterleavedWithPerfOld, "DecodeInterleavedWithPerfOld"},
{5, nullptr, "DecodeInterleavedForMultiStreamWithPerfOld"},
{6, &IHardwareOpusDecoderManager::DecodeInterleaved, "DecodeInterleaved"},
{7, nullptr, "DecodeInterleavedForMultiStream"},
};
// clang-format on
RegisterHandlers(functions);
}
private:
void DecodeInterleavedOld(Kernel::HLERequestContext& ctx) {
LOG_DEBUG(Audio, "called");
decoder_state->DecodeInterleaved(ctx, OpusDecoderStateBase::PerfTime::Disabled,
OpusDecoderStateBase::ExtraBehavior::None);
}
void DecodeInterleavedWithPerfOld(Kernel::HLERequestContext& ctx) {
LOG_DEBUG(Audio, "called");
decoder_state->DecodeInterleaved(ctx, OpusDecoderStateBase::PerfTime::Enabled,
OpusDecoderStateBase::ExtraBehavior::None);
}
void DecodeInterleaved(Kernel::HLERequestContext& ctx) {
LOG_DEBUG(Audio, "called");
IPC::RequestParser rp{ctx};
const auto extra_behavior = rp.Pop<bool>()
? OpusDecoderStateBase::ExtraBehavior::ResetContext
: OpusDecoderStateBase::ExtraBehavior::None;
decoder_state->DecodeInterleaved(ctx, OpusDecoderStateBase::PerfTime::Enabled,
extra_behavior);
}
std::unique_ptr<OpusDecoderStateBase> decoder_state;
};
std::size_t WorkerBufferSize(u32 channel_count) {
ASSERT_MSG(channel_count == 1 || channel_count == 2, "Invalid channel count");
return opus_decoder_get_size(static_cast<int>(channel_count));
}
} // Anonymous namespace
void HwOpus::GetWorkBufferSize(Kernel::HLERequestContext& ctx) {
IPC::RequestParser rp{ctx};
@@ -220,8 +259,7 @@ void HwOpus::OpenOpusDecoder(Kernel::HLERequestContext& ctx) {
const std::size_t worker_sz = WorkerBufferSize(channel_count);
ASSERT_MSG(buffer_sz >= worker_sz, "Worker buffer too large");
std::unique_ptr<OpusDecoder, OpusDeleter> decoder{
static_cast<OpusDecoder*>(operator new(worker_sz))};
OpusDecoderPtr decoder{static_cast<OpusDecoder*>(operator new(worker_sz))};
if (const int err = opus_decoder_init(decoder.get(), sample_rate, channel_count)) {
LOG_ERROR(Audio, "Failed to init opus decoder with error={}", err);
IPC::ResponseBuilder rb{ctx, 2};
@@ -232,8 +270,8 @@ void HwOpus::OpenOpusDecoder(Kernel::HLERequestContext& ctx) {
IPC::ResponseBuilder rb{ctx, 2, 0, 1};
rb.Push(RESULT_SUCCESS);
rb.PushIpcInterface<IHardwareOpusDecoderManager>(std::move(decoder), sample_rate,
channel_count);
rb.PushIpcInterface<IHardwareOpusDecoderManager>(
std::make_unique<OpusDecoderState>(std::move(decoder), sample_rate, channel_count));
}
HwOpus::HwOpus() : ServiceFramework("hwopus") {

View File

@@ -10,6 +10,7 @@
#include "core/core.h"
#include "core/hle/service/nvdrv/devices/nvhost_as_gpu.h"
#include "core/hle/service/nvdrv/devices/nvmap.h"
#include "core/memory.h"
#include "video_core/memory_manager.h"
#include "video_core/rasterizer_interface.h"
#include "video_core/renderer_base.h"
@@ -178,7 +179,7 @@ u32 nvhost_as_gpu::UnmapBuffer(const std::vector<u8>& input, std::vector<u8>& ou
auto& gpu = system_instance.GPU();
auto cpu_addr = gpu.MemoryManager().GpuToCpuAddress(params.offset);
ASSERT(cpu_addr);
gpu.FlushAndInvalidateRegion(*cpu_addr, itr->second.size);
gpu.FlushAndInvalidateRegion(ToCacheAddr(Memory::GetPointer(*cpu_addr)), itr->second.size);
params.offset = gpu.MemoryManager().UnmapBuffer(params.offset, itr->second.size);

View File

@@ -11,7 +11,6 @@
#include "core/hle/ipc.h"
#include "core/hle/ipc_helpers.h"
#include "core/hle/kernel/client_port.h"
#include "core/hle/kernel/handle_table.h"
#include "core/hle/kernel/kernel.h"
#include "core/hle/kernel/process.h"
#include "core/hle/kernel/server_port.h"
@@ -76,7 +75,8 @@ namespace Service {
* Creates a function string for logging, complete with the name (or header code, depending
* on what's passed in) the port name, and all the cmd_buff arguments.
*/
[[maybe_unused]] static std::string MakeFunctionString(const char* name, const char* port_name,
[[maybe_unused]] static std::string MakeFunctionString(std::string_view name,
std::string_view port_name,
const u32* cmd_buff) {
// Number of params == bits 0-5 + bits 6-11
int num_params = (cmd_buff[0] & 0x3F) + ((cmd_buff[0] >> 6) & 0x3F);
@@ -158,9 +158,7 @@ void ServiceFrameworkBase::InvokeRequest(Kernel::HLERequestContext& ctx) {
return ReportUnimplementedFunction(ctx, info);
}
LOG_TRACE(
Service, "{}",
MakeFunctionString(info->name, GetServiceName().c_str(), ctx.CommandBuffer()).c_str());
LOG_TRACE(Service, "{}", MakeFunctionString(info->name, GetServiceName(), ctx.CommandBuffer()));
handler_invoker(this, info->handler_callback, ctx);
}
@@ -169,7 +167,7 @@ ResultCode ServiceFrameworkBase::HandleSyncRequest(Kernel::HLERequestContext& co
case IPC::CommandType::Close: {
IPC::ResponseBuilder rb{context, 2};
rb.Push(RESULT_SUCCESS);
return ResultCode(ErrorModule::HIPC, ErrorDescription::RemoteProcessDead);
return IPC::ERR_REMOTE_PROCESS_DEAD;
}
case IPC::CommandType::ControlWithContext:
case IPC::CommandType::Control: {

View File

@@ -30,7 +30,7 @@ void Controller::DuplicateSession(Kernel::HLERequestContext& ctx) {
IPC::ResponseBuilder rb{ctx, 2, 0, 1, IPC::ResponseBuilder::Flags::AlwaysMoveHandles};
rb.Push(RESULT_SUCCESS);
Kernel::SharedPtr<Kernel::ClientSession> session{ctx.Session()->parent->client};
Kernel::SharedPtr<Kernel::ClientSession> session{ctx.Session()->GetParent()->client};
rb.PushMoveObjects(session);
LOG_DEBUG(Service, "session={}", session->GetObjectId());

View File

@@ -67,7 +67,7 @@ public:
if (port == nullptr) {
return nullptr;
}
return std::static_pointer_cast<T>(port->hle_handler);
return std::static_pointer_cast<T>(port->GetHLEHandler());
}
void InvokeControlRequest(Kernel::HLERequestContext& context);

View File

@@ -24,6 +24,7 @@
#include "core/hle/service/nvdrv/nvdrv.h"
#include "core/hle/service/nvflinger/buffer_queue.h"
#include "core/hle/service/nvflinger/nvflinger.h"
#include "core/hle/service/service.h"
#include "core/hle/service/vi/vi.h"
#include "core/hle/service/vi/vi_m.h"
#include "core/hle/service/vi/vi_s.h"
@@ -33,6 +34,7 @@
namespace Service::VI {
constexpr ResultCode ERR_OPERATION_FAILED{ErrorModule::VI, 1};
constexpr ResultCode ERR_PERMISSION_DENIED{ErrorModule::VI, 5};
constexpr ResultCode ERR_UNSUPPORTED{ErrorModule::VI, 6};
constexpr ResultCode ERR_NOT_FOUND{ErrorModule::VI, 7};
@@ -1203,26 +1205,40 @@ IApplicationDisplayService::IApplicationDisplayService(
RegisterHandlers(functions);
}
Module::Interface::Interface(std::shared_ptr<Module> module, const char* name,
std::shared_ptr<NVFlinger::NVFlinger> nv_flinger)
: ServiceFramework(name), module(std::move(module)), nv_flinger(std::move(nv_flinger)) {}
static bool IsValidServiceAccess(Permission permission, Policy policy) {
if (permission == Permission::User) {
return policy == Policy::User;
}
Module::Interface::~Interface() = default;
if (permission == Permission::System || permission == Permission::Manager) {
return policy == Policy::User || policy == Policy::Compositor;
}
void Module::Interface::GetDisplayService(Kernel::HLERequestContext& ctx) {
LOG_WARNING(Service_VI, "(STUBBED) called");
return false;
}
void detail::GetDisplayServiceImpl(Kernel::HLERequestContext& ctx,
std::shared_ptr<NVFlinger::NVFlinger> nv_flinger,
Permission permission) {
IPC::RequestParser rp{ctx};
const auto policy = rp.PopEnum<Policy>();
if (!IsValidServiceAccess(permission, policy)) {
IPC::ResponseBuilder rb{ctx, 2};
rb.Push(ERR_PERMISSION_DENIED);
return;
}
IPC::ResponseBuilder rb{ctx, 2, 0, 1};
rb.Push(RESULT_SUCCESS);
rb.PushIpcInterface<IApplicationDisplayService>(nv_flinger);
rb.PushIpcInterface<IApplicationDisplayService>(std::move(nv_flinger));
}
void InstallInterfaces(SM::ServiceManager& service_manager,
std::shared_ptr<NVFlinger::NVFlinger> nv_flinger) {
auto module = std::make_shared<Module>();
std::make_shared<VI_M>(module, nv_flinger)->InstallAsService(service_manager);
std::make_shared<VI_S>(module, nv_flinger)->InstallAsService(service_manager);
std::make_shared<VI_U>(module, nv_flinger)->InstallAsService(service_manager);
std::make_shared<VI_M>(nv_flinger)->InstallAsService(service_manager);
std::make_shared<VI_S>(nv_flinger)->InstallAsService(service_manager);
std::make_shared<VI_U>(nv_flinger)->InstallAsService(service_manager);
}
} // namespace Service::VI

View File

@@ -4,12 +4,21 @@
#pragma once
#include "core/hle/service/service.h"
#include <memory>
#include "common/common_types.h"
namespace Kernel {
class HLERequestContext;
}
namespace Service::NVFlinger {
class NVFlinger;
}
namespace Service::SM {
class ServiceManager;
}
namespace Service::VI {
enum class DisplayResolution : u32 {
@@ -19,22 +28,25 @@ enum class DisplayResolution : u32 {
UndockedHeight = 720,
};
class Module final {
public:
class Interface : public ServiceFramework<Interface> {
public:
explicit Interface(std::shared_ptr<Module> module, const char* name,
std::shared_ptr<NVFlinger::NVFlinger> nv_flinger);
~Interface() override;
void GetDisplayService(Kernel::HLERequestContext& ctx);
protected:
std::shared_ptr<Module> module;
std::shared_ptr<NVFlinger::NVFlinger> nv_flinger;
};
/// Permission level for a particular VI service instance
enum class Permission {
User,
System,
Manager,
};
/// A policy type that may be requested via GetDisplayService and
/// GetDisplayServiceWithProxyNameExchange
enum class Policy {
User,
Compositor,
};
namespace detail {
void GetDisplayServiceImpl(Kernel::HLERequestContext& ctx,
std::shared_ptr<NVFlinger::NVFlinger> nv_flinger, Permission permission);
} // namespace detail
/// Registers all VI services with the specified service manager.
void InstallInterfaces(SM::ServiceManager& service_manager,
std::shared_ptr<NVFlinger::NVFlinger> nv_flinger);

View File

@@ -2,12 +2,14 @@
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#include "common/logging/log.h"
#include "core/hle/service/vi/vi.h"
#include "core/hle/service/vi/vi_m.h"
namespace Service::VI {
VI_M::VI_M(std::shared_ptr<Module> module, std::shared_ptr<NVFlinger::NVFlinger> nv_flinger)
: Module::Interface(std::move(module), "vi:m", std::move(nv_flinger)) {
VI_M::VI_M(std::shared_ptr<NVFlinger::NVFlinger> nv_flinger)
: ServiceFramework{"vi:m"}, nv_flinger{std::move(nv_flinger)} {
static const FunctionInfo functions[] = {
{2, &VI_M::GetDisplayService, "GetDisplayService"},
{3, nullptr, "GetDisplayServiceWithProxyNameExchange"},
@@ -17,4 +19,10 @@ VI_M::VI_M(std::shared_ptr<Module> module, std::shared_ptr<NVFlinger::NVFlinger>
VI_M::~VI_M() = default;
void VI_M::GetDisplayService(Kernel::HLERequestContext& ctx) {
LOG_DEBUG(Service_VI, "called");
detail::GetDisplayServiceImpl(ctx, nv_flinger, Permission::Manager);
}
} // namespace Service::VI

View File

@@ -4,14 +4,27 @@
#pragma once
#include "core/hle/service/vi/vi.h"
#include "core/hle/service/service.h"
namespace Kernel {
class HLERequestContext;
}
namespace Service::NVFlinger {
class NVFlinger;
}
namespace Service::VI {
class VI_M final : public Module::Interface {
class VI_M final : public ServiceFramework<VI_M> {
public:
explicit VI_M(std::shared_ptr<Module> module, std::shared_ptr<NVFlinger::NVFlinger> nv_flinger);
explicit VI_M(std::shared_ptr<NVFlinger::NVFlinger> nv_flinger);
~VI_M() override;
private:
void GetDisplayService(Kernel::HLERequestContext& ctx);
std::shared_ptr<NVFlinger::NVFlinger> nv_flinger;
};
} // namespace Service::VI

View File

@@ -2,12 +2,14 @@
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#include "common/logging/log.h"
#include "core/hle/service/vi/vi.h"
#include "core/hle/service/vi/vi_s.h"
namespace Service::VI {
VI_S::VI_S(std::shared_ptr<Module> module, std::shared_ptr<NVFlinger::NVFlinger> nv_flinger)
: Module::Interface(std::move(module), "vi:s", std::move(nv_flinger)) {
VI_S::VI_S(std::shared_ptr<NVFlinger::NVFlinger> nv_flinger)
: ServiceFramework{"vi:s"}, nv_flinger{std::move(nv_flinger)} {
static const FunctionInfo functions[] = {
{1, &VI_S::GetDisplayService, "GetDisplayService"},
{3, nullptr, "GetDisplayServiceWithProxyNameExchange"},
@@ -17,4 +19,10 @@ VI_S::VI_S(std::shared_ptr<Module> module, std::shared_ptr<NVFlinger::NVFlinger>
VI_S::~VI_S() = default;
void VI_S::GetDisplayService(Kernel::HLERequestContext& ctx) {
LOG_DEBUG(Service_VI, "called");
detail::GetDisplayServiceImpl(ctx, nv_flinger, Permission::System);
}
} // namespace Service::VI

View File

@@ -4,14 +4,27 @@
#pragma once
#include "core/hle/service/vi/vi.h"
#include "core/hle/service/service.h"
namespace Kernel {
class HLERequestContext;
}
namespace Service::NVFlinger {
class NVFlinger;
}
namespace Service::VI {
class VI_S final : public Module::Interface {
class VI_S final : public ServiceFramework<VI_S> {
public:
explicit VI_S(std::shared_ptr<Module> module, std::shared_ptr<NVFlinger::NVFlinger> nv_flinger);
explicit VI_S(std::shared_ptr<NVFlinger::NVFlinger> nv_flinger);
~VI_S() override;
private:
void GetDisplayService(Kernel::HLERequestContext& ctx);
std::shared_ptr<NVFlinger::NVFlinger> nv_flinger;
};
} // namespace Service::VI

View File

@@ -2,12 +2,14 @@
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#include "common/logging/log.h"
#include "core/hle/service/vi/vi.h"
#include "core/hle/service/vi/vi_u.h"
namespace Service::VI {
VI_U::VI_U(std::shared_ptr<Module> module, std::shared_ptr<NVFlinger::NVFlinger> nv_flinger)
: Module::Interface(std::move(module), "vi:u", std::move(nv_flinger)) {
VI_U::VI_U(std::shared_ptr<NVFlinger::NVFlinger> nv_flinger)
: ServiceFramework{"vi:u"}, nv_flinger{std::move(nv_flinger)} {
static const FunctionInfo functions[] = {
{0, &VI_U::GetDisplayService, "GetDisplayService"},
};
@@ -16,4 +18,10 @@ VI_U::VI_U(std::shared_ptr<Module> module, std::shared_ptr<NVFlinger::NVFlinger>
VI_U::~VI_U() = default;
void VI_U::GetDisplayService(Kernel::HLERequestContext& ctx) {
LOG_DEBUG(Service_VI, "called");
detail::GetDisplayServiceImpl(ctx, nv_flinger, Permission::User);
}
} // namespace Service::VI

View File

@@ -4,14 +4,27 @@
#pragma once
#include "core/hle/service/vi/vi.h"
#include "core/hle/service/service.h"
namespace Kernel {
class HLERequestContext;
}
namespace Service::NVFlinger {
class NVFlinger;
}
namespace Service::VI {
class VI_U final : public Module::Interface {
class VI_U final : public ServiceFramework<VI_U> {
public:
explicit VI_U(std::shared_ptr<Module> module, std::shared_ptr<NVFlinger::NVFlinger> nv_flinger);
explicit VI_U(std::shared_ptr<NVFlinger::NVFlinger> nv_flinger);
~VI_U() override;
private:
void GetDisplayService(Kernel::HLERequestContext& ctx);
std::shared_ptr<NVFlinger::NVFlinger> nv_flinger;
};
} // namespace Service::VI

View File

@@ -18,6 +18,7 @@
#include "core/hle/lock.h"
#include "core/memory.h"
#include "core/memory_setup.h"
#include "video_core/gpu.h"
#include "video_core/renderer_base.h"
namespace Memory {
@@ -67,8 +68,11 @@ static void MapPages(PageTable& page_table, VAddr base, u64 size, u8* memory, Pa
LOG_DEBUG(HW_Memory, "Mapping {} onto {:016X}-{:016X}", fmt::ptr(memory), base * PAGE_SIZE,
(base + size) * PAGE_SIZE);
RasterizerFlushVirtualRegion(base << PAGE_BITS, size * PAGE_SIZE,
FlushMode::FlushAndInvalidate);
// During boot, current_page_table might not be set yet, in which case we need not flush
if (current_page_table) {
Core::System::GetInstance().GPU().FlushAndInvalidateRegion(base << PAGE_BITS,
size * PAGE_SIZE);
}
VAddr end = base + size;
ASSERT_MSG(end <= page_table.pointers.size(), "out of range mapping at {:016X}",
@@ -180,10 +184,10 @@ T Read(const VAddr vaddr) {
ASSERT_MSG(false, "Mapped memory page without a pointer @ {:016X}", vaddr);
break;
case PageType::RasterizerCachedMemory: {
RasterizerFlushVirtualRegion(vaddr, sizeof(T), FlushMode::Flush);
auto host_ptr{GetPointerFromVMA(vaddr)};
Core::System::GetInstance().GPU().FlushRegion(ToCacheAddr(host_ptr), sizeof(T));
T value;
std::memcpy(&value, GetPointerFromVMA(vaddr), sizeof(T));
std::memcpy(&value, host_ptr, sizeof(T));
return value;
}
default:
@@ -211,8 +215,9 @@ void Write(const VAddr vaddr, const T data) {
ASSERT_MSG(false, "Mapped memory page without a pointer @ {:016X}", vaddr);
break;
case PageType::RasterizerCachedMemory: {
RasterizerFlushVirtualRegion(vaddr, sizeof(T), FlushMode::Invalidate);
std::memcpy(GetPointerFromVMA(vaddr), &data, sizeof(T));
auto host_ptr{GetPointerFromVMA(vaddr)};
Core::System::GetInstance().GPU().InvalidateRegion(ToCacheAddr(host_ptr), sizeof(T));
std::memcpy(host_ptr, &data, sizeof(T));
break;
}
default:
@@ -335,47 +340,6 @@ void RasterizerMarkRegionCached(VAddr vaddr, u64 size, bool cached) {
}
}
void RasterizerFlushVirtualRegion(VAddr start, u64 size, FlushMode mode) {
auto& system_instance = Core::System::GetInstance();
// Since pages are unmapped on shutdown after video core is shutdown, the renderer may be
// null here
if (!system_instance.IsPoweredOn()) {
return;
}
const VAddr end = start + size;
const auto CheckRegion = [&](VAddr region_start, VAddr region_end) {
if (start >= region_end || end <= region_start) {
// No overlap with region
return;
}
const VAddr overlap_start = std::max(start, region_start);
const VAddr overlap_end = std::min(end, region_end);
const VAddr overlap_size = overlap_end - overlap_start;
auto& gpu = system_instance.GPU();
switch (mode) {
case FlushMode::Flush:
gpu.FlushRegion(overlap_start, overlap_size);
break;
case FlushMode::Invalidate:
gpu.InvalidateRegion(overlap_start, overlap_size);
break;
case FlushMode::FlushAndInvalidate:
gpu.FlushAndInvalidateRegion(overlap_start, overlap_size);
break;
}
};
const auto& vm_manager = Core::CurrentProcess()->VMManager();
CheckRegion(vm_manager.GetCodeRegionBaseAddress(), vm_manager.GetCodeRegionEndAddress());
CheckRegion(vm_manager.GetHeapRegionBaseAddress(), vm_manager.GetHeapRegionEndAddress());
}
u8 Read8(const VAddr addr) {
return Read<u8>(addr);
}
@@ -421,9 +385,9 @@ void ReadBlock(const Kernel::Process& process, const VAddr src_addr, void* dest_
break;
}
case PageType::RasterizerCachedMemory: {
RasterizerFlushVirtualRegion(current_vaddr, static_cast<u32>(copy_amount),
FlushMode::Flush);
std::memcpy(dest_buffer, GetPointerFromVMA(process, current_vaddr), copy_amount);
const auto& host_ptr{GetPointerFromVMA(process, current_vaddr)};
Core::System::GetInstance().GPU().FlushRegion(ToCacheAddr(host_ptr), copy_amount);
std::memcpy(dest_buffer, host_ptr, copy_amount);
break;
}
default:
@@ -484,9 +448,9 @@ void WriteBlock(const Kernel::Process& process, const VAddr dest_addr, const voi
break;
}
case PageType::RasterizerCachedMemory: {
RasterizerFlushVirtualRegion(current_vaddr, static_cast<u32>(copy_amount),
FlushMode::Invalidate);
std::memcpy(GetPointerFromVMA(process, current_vaddr), src_buffer, copy_amount);
const auto& host_ptr{GetPointerFromVMA(process, current_vaddr)};
Core::System::GetInstance().GPU().InvalidateRegion(ToCacheAddr(host_ptr), copy_amount);
std::memcpy(host_ptr, src_buffer, copy_amount);
break;
}
default:
@@ -530,9 +494,9 @@ void ZeroBlock(const Kernel::Process& process, const VAddr dest_addr, const std:
break;
}
case PageType::RasterizerCachedMemory: {
RasterizerFlushVirtualRegion(current_vaddr, static_cast<u32>(copy_amount),
FlushMode::Invalidate);
std::memset(GetPointerFromVMA(process, current_vaddr), 0, copy_amount);
const auto& host_ptr{GetPointerFromVMA(process, current_vaddr)};
Core::System::GetInstance().GPU().InvalidateRegion(ToCacheAddr(host_ptr), copy_amount);
std::memset(host_ptr, 0, copy_amount);
break;
}
default:
@@ -572,9 +536,9 @@ void CopyBlock(const Kernel::Process& process, VAddr dest_addr, VAddr src_addr,
break;
}
case PageType::RasterizerCachedMemory: {
RasterizerFlushVirtualRegion(current_vaddr, static_cast<u32>(copy_amount),
FlushMode::Flush);
WriteBlock(process, dest_addr, GetPointerFromVMA(process, current_vaddr), copy_amount);
const auto& host_ptr{GetPointerFromVMA(process, current_vaddr)};
Core::System::GetInstance().GPU().FlushRegion(ToCacheAddr(host_ptr), copy_amount);
WriteBlock(process, dest_addr, host_ptr, copy_amount);
break;
}
default:

View File

@@ -161,10 +161,4 @@ enum class FlushMode {
*/
void RasterizerMarkRegionCached(VAddr vaddr, u64 size, bool cached);
/**
* Flushes and invalidates any externally cached rasterizer resources touching the given virtual
* address region.
*/
void RasterizerFlushVirtualRegion(VAddr start, u64 size, FlushMode mode);
} // namespace Memory

View File

@@ -91,7 +91,10 @@ void LogSettings() {
LogSetting("Renderer_UseResolutionFactor", Settings::values.resolution_factor);
LogSetting("Renderer_UseFrameLimit", Settings::values.use_frame_limit);
LogSetting("Renderer_FrameLimit", Settings::values.frame_limit);
LogSetting("Renderer_UseDiskShaderCache", Settings::values.use_disk_shader_cache);
LogSetting("Renderer_UseAccurateGpuEmulation", Settings::values.use_accurate_gpu_emulation);
LogSetting("Renderer_UseAsynchronousGpuEmulation",
Settings::values.use_asynchronous_gpu_emulation);
LogSetting("Audio_OutputEngine", Settings::values.sink_id);
LogSetting("Audio_EnableAudioStretching", Settings::values.enable_audio_stretching);
LogSetting("Audio_OutputDevice", Settings::values.audio_device_id);

View File

@@ -7,15 +7,18 @@ add_library(input_common STATIC
main.h
motion_emu.cpp
motion_emu.h
$<$<BOOL:${SDL2_FOUND}>:sdl/sdl.cpp sdl/sdl.h>
sdl/sdl.cpp
sdl/sdl.h
)
create_target_directory_groups(input_common)
target_link_libraries(input_common PUBLIC core PRIVATE common)
if(SDL2_FOUND)
target_sources(input_common PRIVATE
sdl/sdl_impl.cpp
sdl/sdl_impl.h
)
target_link_libraries(input_common PRIVATE SDL2)
target_compile_definitions(input_common PRIVATE HAVE_SDL2)
endif()
create_target_directory_groups(input_common)
target_link_libraries(input_common PUBLIC core PRIVATE common)

View File

@@ -17,10 +17,7 @@ namespace InputCommon {
static std::shared_ptr<Keyboard> keyboard;
static std::shared_ptr<MotionEmu> motion_emu;
#ifdef HAVE_SDL2
static std::thread poll_thread;
#endif
static std::unique_ptr<SDL::State> sdl;
void Init() {
keyboard = std::make_shared<Keyboard>();
@@ -30,15 +27,7 @@ void Init() {
motion_emu = std::make_shared<MotionEmu>();
Input::RegisterFactory<Input::MotionDevice>("motion_emu", motion_emu);
#ifdef HAVE_SDL2
SDL::Init();
#endif
}
void StartJoystickEventHandler() {
#ifdef HAVE_SDL2
poll_thread = std::thread(SDL::PollLoop);
#endif
sdl = SDL::Init();
}
void Shutdown() {
@@ -47,11 +36,7 @@ void Shutdown() {
Input::UnregisterFactory<Input::AnalogDevice>("analog_from_button");
Input::UnregisterFactory<Input::MotionDevice>("motion_emu");
motion_emu.reset();
#ifdef HAVE_SDL2
SDL::Shutdown();
poll_thread.join();
#endif
sdl.reset();
}
Keyboard* GetKeyboard() {
@@ -88,7 +73,7 @@ namespace Polling {
std::vector<std::unique_ptr<DevicePoller>> GetPollers(DeviceType type) {
#ifdef HAVE_SDL2
return SDL::Polling::GetPollers(type);
return sdl->GetPollers(type);
#else
return {};
#endif

View File

@@ -20,8 +20,6 @@ void Init();
/// Deregisters all built-in input device factories and shuts them down.
void Shutdown();
void StartJoystickEventHandler();
class Keyboard;
/// Gets the keyboard button device factory.

View File

@@ -1,631 +1,19 @@
// Copyright 2017 Citra Emulator Project
// Copyright 2018 Citra Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#include <algorithm>
#include <atomic>
#include <cmath>
#include <functional>
#include <iterator>
#include <mutex>
#include <string>
#include <thread>
#include <tuple>
#include <unordered_map>
#include <utility>
#include <vector>
#include <SDL.h>
#include "common/assert.h"
#include "common/logging/log.h"
#include "common/math_util.h"
#include "common/param_package.h"
#include "common/threadsafe_queue.h"
#include "input_common/main.h"
#include "input_common/sdl/sdl.h"
#ifdef HAVE_SDL2
#include "input_common/sdl/sdl_impl.h"
#endif
namespace InputCommon {
namespace InputCommon::SDL {
namespace SDL {
class SDLJoystick;
class SDLButtonFactory;
class SDLAnalogFactory;
/// Map of GUID of a list of corresponding virtual Joysticks
static std::unordered_map<std::string, std::vector<std::shared_ptr<SDLJoystick>>> joystick_map;
static std::mutex joystick_map_mutex;
static std::shared_ptr<SDLButtonFactory> button_factory;
static std::shared_ptr<SDLAnalogFactory> analog_factory;
/// Used by the Pollers during config
static std::atomic<bool> polling;
static Common::SPSCQueue<SDL_Event> event_queue;
static std::atomic<bool> initialized = false;
static std::string GetGUID(SDL_Joystick* joystick) {
SDL_JoystickGUID guid = SDL_JoystickGetGUID(joystick);
char guid_str[33];
SDL_JoystickGetGUIDString(guid, guid_str, sizeof(guid_str));
return guid_str;
std::unique_ptr<State> Init() {
#ifdef HAVE_SDL2
return std::make_unique<SDLState>();
#else
return std::make_unique<NullState>();
#endif
}
class SDLJoystick {
public:
SDLJoystick(std::string guid_, int port_, SDL_Joystick* joystick,
decltype(&SDL_JoystickClose) deleter = &SDL_JoystickClose)
: guid{std::move(guid_)}, port{port_}, sdl_joystick{joystick, deleter} {}
void SetButton(int button, bool value) {
std::lock_guard<std::mutex> lock(mutex);
state.buttons[button] = value;
}
bool GetButton(int button) const {
std::lock_guard<std::mutex> lock(mutex);
return state.buttons.at(button);
}
void SetAxis(int axis, Sint16 value) {
std::lock_guard<std::mutex> lock(mutex);
state.axes[axis] = value;
}
float GetAxis(int axis) const {
std::lock_guard<std::mutex> lock(mutex);
return state.axes.at(axis) / 32767.0f;
}
std::tuple<float, float> GetAnalog(int axis_x, int axis_y) const {
float x = GetAxis(axis_x);
float y = GetAxis(axis_y);
y = -y; // 3DS uses an y-axis inverse from SDL
// Make sure the coordinates are in the unit circle,
// otherwise normalize it.
float r = x * x + y * y;
if (r > 1.0f) {
r = std::sqrt(r);
x /= r;
y /= r;
}
return std::make_tuple(x, y);
}
void SetHat(int hat, Uint8 direction) {
std::lock_guard<std::mutex> lock(mutex);
state.hats[hat] = direction;
}
bool GetHatDirection(int hat, Uint8 direction) const {
std::lock_guard<std::mutex> lock(mutex);
return (state.hats.at(hat) & direction) != 0;
}
/**
* The guid of the joystick
*/
const std::string& GetGUID() const {
return guid;
}
/**
* The number of joystick from the same type that were connected before this joystick
*/
int GetPort() const {
return port;
}
SDL_Joystick* GetSDLJoystick() const {
return sdl_joystick.get();
}
void SetSDLJoystick(SDL_Joystick* joystick,
decltype(&SDL_JoystickClose) deleter = &SDL_JoystickClose) {
sdl_joystick =
std::unique_ptr<SDL_Joystick, decltype(&SDL_JoystickClose)>(joystick, deleter);
}
private:
struct State {
std::unordered_map<int, bool> buttons;
std::unordered_map<int, Sint16> axes;
std::unordered_map<int, Uint8> hats;
} state;
std::string guid;
int port;
std::unique_ptr<SDL_Joystick, decltype(&SDL_JoystickClose)> sdl_joystick;
mutable std::mutex mutex;
};
/**
* Get the nth joystick with the corresponding GUID
*/
static std::shared_ptr<SDLJoystick> GetSDLJoystickByGUID(const std::string& guid, int port) {
std::lock_guard<std::mutex> lock(joystick_map_mutex);
const auto it = joystick_map.find(guid);
if (it != joystick_map.end()) {
while (it->second.size() <= port) {
auto joystick = std::make_shared<SDLJoystick>(guid, it->second.size(), nullptr,
[](SDL_Joystick*) {});
it->second.emplace_back(std::move(joystick));
}
return it->second[port];
}
auto joystick = std::make_shared<SDLJoystick>(guid, 0, nullptr, [](SDL_Joystick*) {});
return joystick_map[guid].emplace_back(std::move(joystick));
}
/**
* Check how many identical joysticks (by guid) were connected before the one with sdl_id and so tie
* it to a SDLJoystick with the same guid and that port
*/
static std::shared_ptr<SDLJoystick> GetSDLJoystickBySDLID(SDL_JoystickID sdl_id) {
std::lock_guard<std::mutex> lock(joystick_map_mutex);
auto sdl_joystick = SDL_JoystickFromInstanceID(sdl_id);
const std::string guid = GetGUID(sdl_joystick);
auto map_it = joystick_map.find(guid);
if (map_it != joystick_map.end()) {
auto vec_it = std::find_if(map_it->second.begin(), map_it->second.end(),
[&sdl_joystick](const std::shared_ptr<SDLJoystick>& joystick) {
return sdl_joystick == joystick->GetSDLJoystick();
});
if (vec_it != map_it->second.end()) {
// This is the common case: There is already an existing SDL_Joystick maped to a
// SDLJoystick. return the SDLJoystick
return *vec_it;
}
// Search for a SDLJoystick without a mapped SDL_Joystick...
auto nullptr_it = std::find_if(map_it->second.begin(), map_it->second.end(),
[](const std::shared_ptr<SDLJoystick>& joystick) {
return !joystick->GetSDLJoystick();
});
if (nullptr_it != map_it->second.end()) {
// ... and map it
(*nullptr_it)->SetSDLJoystick(sdl_joystick);
return *nullptr_it;
}
// There is no SDLJoystick without a mapped SDL_Joystick
// Create a new SDLJoystick
auto joystick = std::make_shared<SDLJoystick>(guid, map_it->second.size(), sdl_joystick);
return map_it->second.emplace_back(std::move(joystick));
}
auto joystick = std::make_shared<SDLJoystick>(guid, 0, sdl_joystick);
return joystick_map[guid].emplace_back(std::move(joystick));
}
void InitJoystick(int joystick_index) {
std::lock_guard<std::mutex> lock(joystick_map_mutex);
SDL_Joystick* sdl_joystick = SDL_JoystickOpen(joystick_index);
if (!sdl_joystick) {
LOG_ERROR(Input, "failed to open joystick {}", joystick_index);
return;
}
std::string guid = GetGUID(sdl_joystick);
if (joystick_map.find(guid) == joystick_map.end()) {
auto joystick = std::make_shared<SDLJoystick>(guid, 0, sdl_joystick);
joystick_map[guid].emplace_back(std::move(joystick));
return;
}
auto& joystick_guid_list = joystick_map[guid];
const auto it = std::find_if(
joystick_guid_list.begin(), joystick_guid_list.end(),
[](const std::shared_ptr<SDLJoystick>& joystick) { return !joystick->GetSDLJoystick(); });
if (it != joystick_guid_list.end()) {
(*it)->SetSDLJoystick(sdl_joystick);
return;
}
auto joystick = std::make_shared<SDLJoystick>(guid, joystick_guid_list.size(), sdl_joystick);
joystick_guid_list.emplace_back(std::move(joystick));
}
void CloseJoystick(SDL_Joystick* sdl_joystick) {
std::lock_guard<std::mutex> lock(joystick_map_mutex);
std::string guid = GetGUID(sdl_joystick);
// This call to guid is save since the joystick is guranteed to be in that map
auto& joystick_guid_list = joystick_map[guid];
const auto joystick_it =
std::find_if(joystick_guid_list.begin(), joystick_guid_list.end(),
[&sdl_joystick](const std::shared_ptr<SDLJoystick>& joystick) {
return joystick->GetSDLJoystick() == sdl_joystick;
});
(*joystick_it)->SetSDLJoystick(nullptr, [](SDL_Joystick*) {});
}
void HandleGameControllerEvent(const SDL_Event& event) {
switch (event.type) {
case SDL_JOYBUTTONUP: {
auto joystick = GetSDLJoystickBySDLID(event.jbutton.which);
if (joystick) {
joystick->SetButton(event.jbutton.button, false);
}
break;
}
case SDL_JOYBUTTONDOWN: {
auto joystick = GetSDLJoystickBySDLID(event.jbutton.which);
if (joystick) {
joystick->SetButton(event.jbutton.button, true);
}
break;
}
case SDL_JOYHATMOTION: {
auto joystick = GetSDLJoystickBySDLID(event.jhat.which);
if (joystick) {
joystick->SetHat(event.jhat.hat, event.jhat.value);
}
break;
}
case SDL_JOYAXISMOTION: {
auto joystick = GetSDLJoystickBySDLID(event.jaxis.which);
if (joystick) {
joystick->SetAxis(event.jaxis.axis, event.jaxis.value);
}
break;
}
case SDL_JOYDEVICEREMOVED:
LOG_DEBUG(Input, "Controller removed with Instance_ID {}", event.jdevice.which);
CloseJoystick(SDL_JoystickFromInstanceID(event.jdevice.which));
break;
case SDL_JOYDEVICEADDED:
LOG_DEBUG(Input, "Controller connected with device index {}", event.jdevice.which);
InitJoystick(event.jdevice.which);
break;
}
}
void CloseSDLJoysticks() {
std::lock_guard<std::mutex> lock(joystick_map_mutex);
joystick_map.clear();
}
void PollLoop() {
if (SDL_Init(SDL_INIT_JOYSTICK) < 0) {
LOG_CRITICAL(Input, "SDL_Init(SDL_INIT_JOYSTICK) failed with: {}", SDL_GetError());
return;
}
SDL_Event event;
while (initialized) {
// Wait for 10 ms or until an event happens
if (SDL_WaitEventTimeout(&event, 10)) {
// Don't handle the event if we are configuring
if (polling) {
event_queue.Push(event);
} else {
HandleGameControllerEvent(event);
}
}
}
CloseSDLJoysticks();
SDL_QuitSubSystem(SDL_INIT_JOYSTICK);
}
class SDLButton final : public Input::ButtonDevice {
public:
explicit SDLButton(std::shared_ptr<SDLJoystick> joystick_, int button_)
: joystick(std::move(joystick_)), button(button_) {}
bool GetStatus() const override {
return joystick->GetButton(button);
}
private:
std::shared_ptr<SDLJoystick> joystick;
int button;
};
class SDLDirectionButton final : public Input::ButtonDevice {
public:
explicit SDLDirectionButton(std::shared_ptr<SDLJoystick> joystick_, int hat_, Uint8 direction_)
: joystick(std::move(joystick_)), hat(hat_), direction(direction_) {}
bool GetStatus() const override {
return joystick->GetHatDirection(hat, direction);
}
private:
std::shared_ptr<SDLJoystick> joystick;
int hat;
Uint8 direction;
};
class SDLAxisButton final : public Input::ButtonDevice {
public:
explicit SDLAxisButton(std::shared_ptr<SDLJoystick> joystick_, int axis_, float threshold_,
bool trigger_if_greater_)
: joystick(std::move(joystick_)), axis(axis_), threshold(threshold_),
trigger_if_greater(trigger_if_greater_) {}
bool GetStatus() const override {
float axis_value = joystick->GetAxis(axis);
if (trigger_if_greater)
return axis_value > threshold;
return axis_value < threshold;
}
private:
std::shared_ptr<SDLJoystick> joystick;
int axis;
float threshold;
bool trigger_if_greater;
};
class SDLAnalog final : public Input::AnalogDevice {
public:
SDLAnalog(std::shared_ptr<SDLJoystick> joystick_, int axis_x_, int axis_y_)
: joystick(std::move(joystick_)), axis_x(axis_x_), axis_y(axis_y_) {}
std::tuple<float, float> GetStatus() const override {
return joystick->GetAnalog(axis_x, axis_y);
}
private:
std::shared_ptr<SDLJoystick> joystick;
int axis_x;
int axis_y;
};
/// A button device factory that creates button devices from SDL joystick
class SDLButtonFactory final : public Input::Factory<Input::ButtonDevice> {
public:
/**
* Creates a button device from a joystick button
* @param params contains parameters for creating the device:
* - "guid": the guid of the joystick to bind
* - "port": the nth joystick of the same type to bind
* - "button"(optional): the index of the button to bind
* - "hat"(optional): the index of the hat to bind as direction buttons
* - "axis"(optional): the index of the axis to bind
* - "direction"(only used for hat): the direction name of the hat to bind. Can be "up",
* "down", "left" or "right"
* - "threshold"(only used for axis): a float value in (-1.0, 1.0) which the button is
* triggered if the axis value crosses
* - "direction"(only used for axis): "+" means the button is triggered when the axis
* value is greater than the threshold; "-" means the button is triggered when the axis
* value is smaller than the threshold
*/
std::unique_ptr<Input::ButtonDevice> Create(const Common::ParamPackage& params) override {
const std::string guid = params.Get("guid", "0");
const int port = params.Get("port", 0);
auto joystick = GetSDLJoystickByGUID(guid, port);
if (params.Has("hat")) {
const int hat = params.Get("hat", 0);
const std::string direction_name = params.Get("direction", "");
Uint8 direction;
if (direction_name == "up") {
direction = SDL_HAT_UP;
} else if (direction_name == "down") {
direction = SDL_HAT_DOWN;
} else if (direction_name == "left") {
direction = SDL_HAT_LEFT;
} else if (direction_name == "right") {
direction = SDL_HAT_RIGHT;
} else {
direction = 0;
}
// This is necessary so accessing GetHat with hat won't crash
joystick->SetHat(hat, SDL_HAT_CENTERED);
return std::make_unique<SDLDirectionButton>(joystick, hat, direction);
}
if (params.Has("axis")) {
const int axis = params.Get("axis", 0);
const float threshold = params.Get("threshold", 0.5f);
const std::string direction_name = params.Get("direction", "");
bool trigger_if_greater;
if (direction_name == "+") {
trigger_if_greater = true;
} else if (direction_name == "-") {
trigger_if_greater = false;
} else {
trigger_if_greater = true;
LOG_ERROR(Input, "Unknown direction '{}'", direction_name);
}
// This is necessary so accessing GetAxis with axis won't crash
joystick->SetAxis(axis, 0);
return std::make_unique<SDLAxisButton>(joystick, axis, threshold, trigger_if_greater);
}
const int button = params.Get("button", 0);
// This is necessary so accessing GetButton with button won't crash
joystick->SetButton(button, false);
return std::make_unique<SDLButton>(joystick, button);
}
};
/// An analog device factory that creates analog devices from SDL joystick
class SDLAnalogFactory final : public Input::Factory<Input::AnalogDevice> {
public:
/**
* Creates analog device from joystick axes
* @param params contains parameters for creating the device:
* - "guid": the guid of the joystick to bind
* - "port": the nth joystick of the same type
* - "axis_x": the index of the axis to be bind as x-axis
* - "axis_y": the index of the axis to be bind as y-axis
*/
std::unique_ptr<Input::AnalogDevice> Create(const Common::ParamPackage& params) override {
const std::string guid = params.Get("guid", "0");
const int port = params.Get("port", 0);
const int axis_x = params.Get("axis_x", 0);
const int axis_y = params.Get("axis_y", 1);
auto joystick = GetSDLJoystickByGUID(guid, port);
// This is necessary so accessing GetAxis with axis_x and axis_y won't crash
joystick->SetAxis(axis_x, 0);
joystick->SetAxis(axis_y, 0);
return std::make_unique<SDLAnalog>(joystick, axis_x, axis_y);
}
};
void Init() {
using namespace Input;
RegisterFactory<ButtonDevice>("sdl", std::make_shared<SDLButtonFactory>());
RegisterFactory<AnalogDevice>("sdl", std::make_shared<SDLAnalogFactory>());
polling = false;
initialized = true;
}
void Shutdown() {
if (initialized) {
using namespace Input;
UnregisterFactory<ButtonDevice>("sdl");
UnregisterFactory<AnalogDevice>("sdl");
initialized = false;
}
}
Common::ParamPackage SDLEventToButtonParamPackage(const SDL_Event& event) {
Common::ParamPackage params({{"engine", "sdl"}});
switch (event.type) {
case SDL_JOYAXISMOTION: {
auto joystick = GetSDLJoystickBySDLID(event.jaxis.which);
params.Set("port", joystick->GetPort());
params.Set("guid", joystick->GetGUID());
params.Set("axis", event.jaxis.axis);
if (event.jaxis.value > 0) {
params.Set("direction", "+");
params.Set("threshold", "0.5");
} else {
params.Set("direction", "-");
params.Set("threshold", "-0.5");
}
break;
}
case SDL_JOYBUTTONUP: {
auto joystick = GetSDLJoystickBySDLID(event.jbutton.which);
params.Set("port", joystick->GetPort());
params.Set("guid", joystick->GetGUID());
params.Set("button", event.jbutton.button);
break;
}
case SDL_JOYHATMOTION: {
auto joystick = GetSDLJoystickBySDLID(event.jhat.which);
params.Set("port", joystick->GetPort());
params.Set("guid", joystick->GetGUID());
params.Set("hat", event.jhat.hat);
switch (event.jhat.value) {
case SDL_HAT_UP:
params.Set("direction", "up");
break;
case SDL_HAT_DOWN:
params.Set("direction", "down");
break;
case SDL_HAT_LEFT:
params.Set("direction", "left");
break;
case SDL_HAT_RIGHT:
params.Set("direction", "right");
break;
default:
return {};
}
break;
}
}
return params;
}
namespace Polling {
class SDLPoller : public InputCommon::Polling::DevicePoller {
public:
void Start() override {
event_queue.Clear();
polling = true;
}
void Stop() override {
polling = false;
}
};
class SDLButtonPoller final : public SDLPoller {
public:
Common::ParamPackage GetNextInput() override {
SDL_Event event;
while (event_queue.Pop(event)) {
switch (event.type) {
case SDL_JOYAXISMOTION:
if (std::abs(event.jaxis.value / 32767.0) < 0.5) {
break;
}
case SDL_JOYBUTTONUP:
case SDL_JOYHATMOTION:
return SDLEventToButtonParamPackage(event);
}
}
return {};
}
};
class SDLAnalogPoller final : public SDLPoller {
public:
void Start() override {
SDLPoller::Start();
// Reset stored axes
analog_xaxis = -1;
analog_yaxis = -1;
analog_axes_joystick = -1;
}
Common::ParamPackage GetNextInput() override {
SDL_Event event;
while (event_queue.Pop(event)) {
if (event.type != SDL_JOYAXISMOTION || std::abs(event.jaxis.value / 32767.0) < 0.5) {
continue;
}
// An analog device needs two axes, so we need to store the axis for later and wait for
// a second SDL event. The axes also must be from the same joystick.
int axis = event.jaxis.axis;
if (analog_xaxis == -1) {
analog_xaxis = axis;
analog_axes_joystick = event.jaxis.which;
} else if (analog_yaxis == -1 && analog_xaxis != axis &&
analog_axes_joystick == event.jaxis.which) {
analog_yaxis = axis;
}
}
Common::ParamPackage params;
if (analog_xaxis != -1 && analog_yaxis != -1) {
auto joystick = GetSDLJoystickBySDLID(event.jaxis.which);
params.Set("engine", "sdl");
params.Set("port", joystick->GetPort());
params.Set("guid", joystick->GetGUID());
params.Set("axis_x", analog_xaxis);
params.Set("axis_y", analog_yaxis);
analog_xaxis = -1;
analog_yaxis = -1;
analog_axes_joystick = -1;
return params;
}
return params;
}
private:
int analog_xaxis = -1;
int analog_yaxis = -1;
SDL_JoystickID analog_axes_joystick = -1;
};
std::vector<std::unique_ptr<InputCommon::Polling::DevicePoller>> GetPollers(
InputCommon::Polling::DeviceType type) {
std::vector<std::unique_ptr<InputCommon::Polling::DevicePoller>> pollers;
switch (type) {
case InputCommon::Polling::DeviceType::Analog:
pollers.push_back(std::make_unique<SDLAnalogPoller>());
break;
case InputCommon::Polling::DeviceType::Button:
pollers.push_back(std::make_unique<SDLButtonPoller>());
break;
}
return pollers;
}
} // namespace Polling
} // namespace SDL
} // namespace InputCommon
} // namespace InputCommon::SDL

View File

@@ -1,4 +1,4 @@
// Copyright 2017 Citra Emulator Project
// Copyright 2018 Citra Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
@@ -7,45 +7,36 @@
#include <memory>
#include <vector>
#include "core/frontend/input.h"
#include "input_common/main.h"
union SDL_Event;
namespace Common {
class ParamPackage;
}
namespace InputCommon {
namespace Polling {
} // namespace Common
namespace InputCommon::Polling {
class DevicePoller;
enum class DeviceType;
} // namespace Polling
} // namespace InputCommon
} // namespace InputCommon::Polling
namespace InputCommon {
namespace SDL {
namespace InputCommon::SDL {
/// Initializes and registers SDL device factories
void Init();
class State {
public:
/// Unresisters SDL device factories and shut them down.
virtual ~State() = default;
/// Unresisters SDL device factories and shut them down.
void Shutdown();
virtual std::vector<std::unique_ptr<InputCommon::Polling::DevicePoller>> GetPollers(
InputCommon::Polling::DeviceType type) = 0;
};
/// Needs to be called before SDL_QuitSubSystem.
void CloseSDLJoysticks();
class NullState : public State {
public:
std::vector<std::unique_ptr<InputCommon::Polling::DevicePoller>> GetPollers(
InputCommon::Polling::DeviceType type) override {}
};
/// Handle SDL_Events for joysticks from SDL_PollEvent
void HandleGameControllerEvent(const SDL_Event& event);
std::unique_ptr<State> Init();
/// A Loop that calls HandleGameControllerEvent until Shutdown is called
void PollLoop();
/// Creates a ParamPackage from an SDL_Event that can directly be used to create a ButtonDevice
Common::ParamPackage SDLEventToButtonParamPackage(const SDL_Event& event);
namespace Polling {
/// Get all DevicePoller that use the SDL backend for a specific device type
std::vector<std::unique_ptr<InputCommon::Polling::DevicePoller>> GetPollers(
InputCommon::Polling::DeviceType type);
} // namespace Polling
} // namespace SDL
} // namespace InputCommon
} // namespace InputCommon::SDL

View File

@@ -0,0 +1,669 @@
// Copyright 2018 Citra Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#include <algorithm>
#include <atomic>
#include <cmath>
#include <functional>
#include <iterator>
#include <mutex>
#include <string>
#include <thread>
#include <tuple>
#include <unordered_map>
#include <utility>
#include <vector>
#include <SDL.h>
#include "common/assert.h"
#include "common/logging/log.h"
#include "common/math_util.h"
#include "common/param_package.h"
#include "common/threadsafe_queue.h"
#include "core/frontend/input.h"
#include "input_common/sdl/sdl_impl.h"
namespace InputCommon {
namespace SDL {
static std::string GetGUID(SDL_Joystick* joystick) {
SDL_JoystickGUID guid = SDL_JoystickGetGUID(joystick);
char guid_str[33];
SDL_JoystickGetGUIDString(guid, guid_str, sizeof(guid_str));
return guid_str;
}
/// Creates a ParamPackage from an SDL_Event that can directly be used to create a ButtonDevice
static Common::ParamPackage SDLEventToButtonParamPackage(SDLState& state, const SDL_Event& event);
static int SDLEventWatcher(void* userdata, SDL_Event* event) {
SDLState* sdl_state = reinterpret_cast<SDLState*>(userdata);
// Don't handle the event if we are configuring
if (sdl_state->polling) {
sdl_state->event_queue.Push(*event);
} else {
sdl_state->HandleGameControllerEvent(*event);
}
return 0;
}
class SDLJoystick {
public:
SDLJoystick(std::string guid_, int port_, SDL_Joystick* joystick,
decltype(&SDL_JoystickClose) deleter = &SDL_JoystickClose)
: guid{std::move(guid_)}, port{port_}, sdl_joystick{joystick, deleter} {}
void SetButton(int button, bool value) {
std::lock_guard<std::mutex> lock(mutex);
state.buttons[button] = value;
}
bool GetButton(int button) const {
std::lock_guard<std::mutex> lock(mutex);
return state.buttons.at(button);
}
void SetAxis(int axis, Sint16 value) {
std::lock_guard<std::mutex> lock(mutex);
state.axes[axis] = value;
}
float GetAxis(int axis) const {
std::lock_guard<std::mutex> lock(mutex);
return state.axes.at(axis) / 32767.0f;
}
std::tuple<float, float> GetAnalog(int axis_x, int axis_y) const {
float x = GetAxis(axis_x);
float y = GetAxis(axis_y);
y = -y; // 3DS uses an y-axis inverse from SDL
// Make sure the coordinates are in the unit circle,
// otherwise normalize it.
float r = x * x + y * y;
if (r > 1.0f) {
r = std::sqrt(r);
x /= r;
y /= r;
}
return std::make_tuple(x, y);
}
void SetHat(int hat, Uint8 direction) {
std::lock_guard<std::mutex> lock(mutex);
state.hats[hat] = direction;
}
bool GetHatDirection(int hat, Uint8 direction) const {
std::lock_guard<std::mutex> lock(mutex);
return (state.hats.at(hat) & direction) != 0;
}
/**
* The guid of the joystick
*/
const std::string& GetGUID() const {
return guid;
}
/**
* The number of joystick from the same type that were connected before this joystick
*/
int GetPort() const {
return port;
}
SDL_Joystick* GetSDLJoystick() const {
return sdl_joystick.get();
}
void SetSDLJoystick(SDL_Joystick* joystick,
decltype(&SDL_JoystickClose) deleter = &SDL_JoystickClose) {
sdl_joystick =
std::unique_ptr<SDL_Joystick, decltype(&SDL_JoystickClose)>(joystick, deleter);
}
private:
struct State {
std::unordered_map<int, bool> buttons;
std::unordered_map<int, Sint16> axes;
std::unordered_map<int, Uint8> hats;
} state;
std::string guid;
int port;
std::unique_ptr<SDL_Joystick, decltype(&SDL_JoystickClose)> sdl_joystick;
mutable std::mutex mutex;
};
/**
* Get the nth joystick with the corresponding GUID
*/
std::shared_ptr<SDLJoystick> SDLState::GetSDLJoystickByGUID(const std::string& guid, int port) {
std::lock_guard<std::mutex> lock(joystick_map_mutex);
const auto it = joystick_map.find(guid);
if (it != joystick_map.end()) {
while (it->second.size() <= port) {
auto joystick = std::make_shared<SDLJoystick>(guid, it->second.size(), nullptr,
[](SDL_Joystick*) {});
it->second.emplace_back(std::move(joystick));
}
return it->second[port];
}
auto joystick = std::make_shared<SDLJoystick>(guid, 0, nullptr, [](SDL_Joystick*) {});
return joystick_map[guid].emplace_back(std::move(joystick));
}
/**
* Check how many identical joysticks (by guid) were connected before the one with sdl_id and so tie
* it to a SDLJoystick with the same guid and that port
*/
std::shared_ptr<SDLJoystick> SDLState::GetSDLJoystickBySDLID(SDL_JoystickID sdl_id) {
auto sdl_joystick = SDL_JoystickFromInstanceID(sdl_id);
const std::string guid = GetGUID(sdl_joystick);
std::lock_guard<std::mutex> lock(joystick_map_mutex);
auto map_it = joystick_map.find(guid);
if (map_it != joystick_map.end()) {
auto vec_it = std::find_if(map_it->second.begin(), map_it->second.end(),
[&sdl_joystick](const std::shared_ptr<SDLJoystick>& joystick) {
return sdl_joystick == joystick->GetSDLJoystick();
});
if (vec_it != map_it->second.end()) {
// This is the common case: There is already an existing SDL_Joystick maped to a
// SDLJoystick. return the SDLJoystick
return *vec_it;
}
// Search for a SDLJoystick without a mapped SDL_Joystick...
auto nullptr_it = std::find_if(map_it->second.begin(), map_it->second.end(),
[](const std::shared_ptr<SDLJoystick>& joystick) {
return !joystick->GetSDLJoystick();
});
if (nullptr_it != map_it->second.end()) {
// ... and map it
(*nullptr_it)->SetSDLJoystick(sdl_joystick);
return *nullptr_it;
}
// There is no SDLJoystick without a mapped SDL_Joystick
// Create a new SDLJoystick
auto joystick = std::make_shared<SDLJoystick>(guid, map_it->second.size(), sdl_joystick);
return map_it->second.emplace_back(std::move(joystick));
}
auto joystick = std::make_shared<SDLJoystick>(guid, 0, sdl_joystick);
return joystick_map[guid].emplace_back(std::move(joystick));
}
void SDLState::InitJoystick(int joystick_index) {
SDL_Joystick* sdl_joystick = SDL_JoystickOpen(joystick_index);
if (!sdl_joystick) {
LOG_ERROR(Input, "failed to open joystick {}", joystick_index);
return;
}
std::string guid = GetGUID(sdl_joystick);
std::lock_guard<std::mutex> lock(joystick_map_mutex);
if (joystick_map.find(guid) == joystick_map.end()) {
auto joystick = std::make_shared<SDLJoystick>(guid, 0, sdl_joystick);
joystick_map[guid].emplace_back(std::move(joystick));
return;
}
auto& joystick_guid_list = joystick_map[guid];
const auto it = std::find_if(
joystick_guid_list.begin(), joystick_guid_list.end(),
[](const std::shared_ptr<SDLJoystick>& joystick) { return !joystick->GetSDLJoystick(); });
if (it != joystick_guid_list.end()) {
(*it)->SetSDLJoystick(sdl_joystick);
return;
}
auto joystick = std::make_shared<SDLJoystick>(guid, joystick_guid_list.size(), sdl_joystick);
joystick_guid_list.emplace_back(std::move(joystick));
}
void SDLState::CloseJoystick(SDL_Joystick* sdl_joystick) {
std::string guid = GetGUID(sdl_joystick);
std::shared_ptr<SDLJoystick> joystick;
{
std::lock_guard<std::mutex> lock(joystick_map_mutex);
// This call to guid is safe since the joystick is guaranteed to be in the map
auto& joystick_guid_list = joystick_map[guid];
const auto joystick_it =
std::find_if(joystick_guid_list.begin(), joystick_guid_list.end(),
[&sdl_joystick](const std::shared_ptr<SDLJoystick>& joystick) {
return joystick->GetSDLJoystick() == sdl_joystick;
});
joystick = *joystick_it;
}
// Destruct SDL_Joystick outside the lock guard because SDL can internally call event calback
// which locks the mutex again
joystick->SetSDLJoystick(nullptr, [](SDL_Joystick*) {});
}
void SDLState::HandleGameControllerEvent(const SDL_Event& event) {
switch (event.type) {
case SDL_JOYBUTTONUP: {
if (auto joystick = GetSDLJoystickBySDLID(event.jbutton.which)) {
joystick->SetButton(event.jbutton.button, false);
}
break;
}
case SDL_JOYBUTTONDOWN: {
if (auto joystick = GetSDLJoystickBySDLID(event.jbutton.which)) {
joystick->SetButton(event.jbutton.button, true);
}
break;
}
case SDL_JOYHATMOTION: {
if (auto joystick = GetSDLJoystickBySDLID(event.jhat.which)) {
joystick->SetHat(event.jhat.hat, event.jhat.value);
}
break;
}
case SDL_JOYAXISMOTION: {
if (auto joystick = GetSDLJoystickBySDLID(event.jaxis.which)) {
joystick->SetAxis(event.jaxis.axis, event.jaxis.value);
}
break;
}
case SDL_JOYDEVICEREMOVED:
LOG_DEBUG(Input, "Controller removed with Instance_ID {}", event.jdevice.which);
CloseJoystick(SDL_JoystickFromInstanceID(event.jdevice.which));
break;
case SDL_JOYDEVICEADDED:
LOG_DEBUG(Input, "Controller connected with device index {}", event.jdevice.which);
InitJoystick(event.jdevice.which);
break;
}
}
void SDLState::CloseJoysticks() {
std::lock_guard<std::mutex> lock(joystick_map_mutex);
joystick_map.clear();
}
class SDLButton final : public Input::ButtonDevice {
public:
explicit SDLButton(std::shared_ptr<SDLJoystick> joystick_, int button_)
: joystick(std::move(joystick_)), button(button_) {}
bool GetStatus() const override {
return joystick->GetButton(button);
}
private:
std::shared_ptr<SDLJoystick> joystick;
int button;
};
class SDLDirectionButton final : public Input::ButtonDevice {
public:
explicit SDLDirectionButton(std::shared_ptr<SDLJoystick> joystick_, int hat_, Uint8 direction_)
: joystick(std::move(joystick_)), hat(hat_), direction(direction_) {}
bool GetStatus() const override {
return joystick->GetHatDirection(hat, direction);
}
private:
std::shared_ptr<SDLJoystick> joystick;
int hat;
Uint8 direction;
};
class SDLAxisButton final : public Input::ButtonDevice {
public:
explicit SDLAxisButton(std::shared_ptr<SDLJoystick> joystick_, int axis_, float threshold_,
bool trigger_if_greater_)
: joystick(std::move(joystick_)), axis(axis_), threshold(threshold_),
trigger_if_greater(trigger_if_greater_) {}
bool GetStatus() const override {
float axis_value = joystick->GetAxis(axis);
if (trigger_if_greater)
return axis_value > threshold;
return axis_value < threshold;
}
private:
std::shared_ptr<SDLJoystick> joystick;
int axis;
float threshold;
bool trigger_if_greater;
};
class SDLAnalog final : public Input::AnalogDevice {
public:
SDLAnalog(std::shared_ptr<SDLJoystick> joystick_, int axis_x_, int axis_y_, float deadzone_)
: joystick(std::move(joystick_)), axis_x(axis_x_), axis_y(axis_y_), deadzone(deadzone_) {}
std::tuple<float, float> GetStatus() const override {
const auto [x, y] = joystick->GetAnalog(axis_x, axis_y);
const float r = std::sqrt((x * x) + (y * y));
if (r > deadzone) {
return std::make_tuple(x / r * (r - deadzone) / (1 - deadzone),
y / r * (r - deadzone) / (1 - deadzone));
}
return std::make_tuple<float, float>(0.0f, 0.0f);
}
private:
std::shared_ptr<SDLJoystick> joystick;
const int axis_x;
const int axis_y;
const float deadzone;
};
/// A button device factory that creates button devices from SDL joystick
class SDLButtonFactory final : public Input::Factory<Input::ButtonDevice> {
public:
explicit SDLButtonFactory(SDLState& state_) : state(state_) {}
/**
* Creates a button device from a joystick button
* @param params contains parameters for creating the device:
* - "guid": the guid of the joystick to bind
* - "port": the nth joystick of the same type to bind
* - "button"(optional): the index of the button to bind
* - "hat"(optional): the index of the hat to bind as direction buttons
* - "axis"(optional): the index of the axis to bind
* - "direction"(only used for hat): the direction name of the hat to bind. Can be "up",
* "down", "left" or "right"
* - "threshold"(only used for axis): a float value in (-1.0, 1.0) which the button is
* triggered if the axis value crosses
* - "direction"(only used for axis): "+" means the button is triggered when the axis
* value is greater than the threshold; "-" means the button is triggered when the axis
* value is smaller than the threshold
*/
std::unique_ptr<Input::ButtonDevice> Create(const Common::ParamPackage& params) override {
const std::string guid = params.Get("guid", "0");
const int port = params.Get("port", 0);
auto joystick = state.GetSDLJoystickByGUID(guid, port);
if (params.Has("hat")) {
const int hat = params.Get("hat", 0);
const std::string direction_name = params.Get("direction", "");
Uint8 direction;
if (direction_name == "up") {
direction = SDL_HAT_UP;
} else if (direction_name == "down") {
direction = SDL_HAT_DOWN;
} else if (direction_name == "left") {
direction = SDL_HAT_LEFT;
} else if (direction_name == "right") {
direction = SDL_HAT_RIGHT;
} else {
direction = 0;
}
// This is necessary so accessing GetHat with hat won't crash
joystick->SetHat(hat, SDL_HAT_CENTERED);
return std::make_unique<SDLDirectionButton>(joystick, hat, direction);
}
if (params.Has("axis")) {
const int axis = params.Get("axis", 0);
const float threshold = params.Get("threshold", 0.5f);
const std::string direction_name = params.Get("direction", "");
bool trigger_if_greater;
if (direction_name == "+") {
trigger_if_greater = true;
} else if (direction_name == "-") {
trigger_if_greater = false;
} else {
trigger_if_greater = true;
LOG_ERROR(Input, "Unknown direction {}", direction_name);
}
// This is necessary so accessing GetAxis with axis won't crash
joystick->SetAxis(axis, 0);
return std::make_unique<SDLAxisButton>(joystick, axis, threshold, trigger_if_greater);
}
const int button = params.Get("button", 0);
// This is necessary so accessing GetButton with button won't crash
joystick->SetButton(button, false);
return std::make_unique<SDLButton>(joystick, button);
}
private:
SDLState& state;
};
/// An analog device factory that creates analog devices from SDL joystick
class SDLAnalogFactory final : public Input::Factory<Input::AnalogDevice> {
public:
explicit SDLAnalogFactory(SDLState& state_) : state(state_) {}
/**
* Creates analog device from joystick axes
* @param params contains parameters for creating the device:
* - "guid": the guid of the joystick to bind
* - "port": the nth joystick of the same type
* - "axis_x": the index of the axis to be bind as x-axis
* - "axis_y": the index of the axis to be bind as y-axis
*/
std::unique_ptr<Input::AnalogDevice> Create(const Common::ParamPackage& params) override {
const std::string guid = params.Get("guid", "0");
const int port = params.Get("port", 0);
const int axis_x = params.Get("axis_x", 0);
const int axis_y = params.Get("axis_y", 1);
float deadzone = std::clamp(params.Get("deadzone", 0.0f), 0.0f, .99f);
auto joystick = state.GetSDLJoystickByGUID(guid, port);
// This is necessary so accessing GetAxis with axis_x and axis_y won't crash
joystick->SetAxis(axis_x, 0);
joystick->SetAxis(axis_y, 0);
return std::make_unique<SDLAnalog>(joystick, axis_x, axis_y, deadzone);
}
private:
SDLState& state;
};
SDLState::SDLState() {
using namespace Input;
RegisterFactory<ButtonDevice>("sdl", std::make_shared<SDLButtonFactory>(*this));
RegisterFactory<AnalogDevice>("sdl", std::make_shared<SDLAnalogFactory>(*this));
// If the frontend is going to manage the event loop, then we dont start one here
start_thread = !SDL_WasInit(SDL_INIT_JOYSTICK);
if (start_thread && SDL_Init(SDL_INIT_JOYSTICK) < 0) {
LOG_CRITICAL(Input, "SDL_Init(SDL_INIT_JOYSTICK) failed with: {}", SDL_GetError());
return;
}
if (SDL_SetHint(SDL_HINT_JOYSTICK_ALLOW_BACKGROUND_EVENTS, "1") == SDL_FALSE) {
LOG_ERROR(Input, "Failed to set Hint for background events", SDL_GetError());
}
SDL_AddEventWatch(&SDLEventWatcher, this);
initialized = true;
if (start_thread) {
poll_thread = std::thread([&] {
using namespace std::chrono_literals;
SDL_Event event;
while (initialized) {
SDL_PumpEvents();
std::this_thread::sleep_for(std::chrono::duration(10ms));
}
});
}
// Because the events for joystick connection happens before we have our event watcher added, we
// can just open all the joysticks right here
for (int i = 0; i < SDL_NumJoysticks(); ++i) {
InitJoystick(i);
}
}
SDLState::~SDLState() {
using namespace Input;
UnregisterFactory<ButtonDevice>("sdl");
UnregisterFactory<AnalogDevice>("sdl");
CloseJoysticks();
SDL_DelEventWatch(&SDLEventWatcher, this);
initialized = false;
if (start_thread) {
poll_thread.join();
SDL_QuitSubSystem(SDL_INIT_JOYSTICK);
}
}
Common::ParamPackage SDLEventToButtonParamPackage(SDLState& state, const SDL_Event& event) {
Common::ParamPackage params({{"engine", "sdl"}});
switch (event.type) {
case SDL_JOYAXISMOTION: {
auto joystick = state.GetSDLJoystickBySDLID(event.jaxis.which);
params.Set("port", joystick->GetPort());
params.Set("guid", joystick->GetGUID());
params.Set("axis", event.jaxis.axis);
if (event.jaxis.value > 0) {
params.Set("direction", "+");
params.Set("threshold", "0.5");
} else {
params.Set("direction", "-");
params.Set("threshold", "-0.5");
}
break;
}
case SDL_JOYBUTTONUP: {
auto joystick = state.GetSDLJoystickBySDLID(event.jbutton.which);
params.Set("port", joystick->GetPort());
params.Set("guid", joystick->GetGUID());
params.Set("button", event.jbutton.button);
break;
}
case SDL_JOYHATMOTION: {
auto joystick = state.GetSDLJoystickBySDLID(event.jhat.which);
params.Set("port", joystick->GetPort());
params.Set("guid", joystick->GetGUID());
params.Set("hat", event.jhat.hat);
switch (event.jhat.value) {
case SDL_HAT_UP:
params.Set("direction", "up");
break;
case SDL_HAT_DOWN:
params.Set("direction", "down");
break;
case SDL_HAT_LEFT:
params.Set("direction", "left");
break;
case SDL_HAT_RIGHT:
params.Set("direction", "right");
break;
default:
return {};
}
break;
}
}
return params;
}
namespace Polling {
class SDLPoller : public InputCommon::Polling::DevicePoller {
public:
explicit SDLPoller(SDLState& state_) : state(state_) {}
void Start() override {
state.event_queue.Clear();
state.polling = true;
}
void Stop() override {
state.polling = false;
}
protected:
SDLState& state;
};
class SDLButtonPoller final : public SDLPoller {
public:
explicit SDLButtonPoller(SDLState& state_) : SDLPoller(state_) {}
Common::ParamPackage GetNextInput() override {
SDL_Event event;
while (state.event_queue.Pop(event)) {
switch (event.type) {
case SDL_JOYAXISMOTION:
if (std::abs(event.jaxis.value / 32767.0) < 0.5) {
break;
}
case SDL_JOYBUTTONUP:
case SDL_JOYHATMOTION:
return SDLEventToButtonParamPackage(state, event);
}
}
return {};
}
};
class SDLAnalogPoller final : public SDLPoller {
public:
explicit SDLAnalogPoller(SDLState& state_) : SDLPoller(state_) {}
void Start() override {
SDLPoller::Start();
// Reset stored axes
analog_xaxis = -1;
analog_yaxis = -1;
analog_axes_joystick = -1;
}
Common::ParamPackage GetNextInput() override {
SDL_Event event;
while (state.event_queue.Pop(event)) {
if (event.type != SDL_JOYAXISMOTION || std::abs(event.jaxis.value / 32767.0) < 0.5) {
continue;
}
// An analog device needs two axes, so we need to store the axis for later and wait for
// a second SDL event. The axes also must be from the same joystick.
int axis = event.jaxis.axis;
if (analog_xaxis == -1) {
analog_xaxis = axis;
analog_axes_joystick = event.jaxis.which;
} else if (analog_yaxis == -1 && analog_xaxis != axis &&
analog_axes_joystick == event.jaxis.which) {
analog_yaxis = axis;
}
}
Common::ParamPackage params;
if (analog_xaxis != -1 && analog_yaxis != -1) {
auto joystick = state.GetSDLJoystickBySDLID(event.jaxis.which);
params.Set("engine", "sdl");
params.Set("port", joystick->GetPort());
params.Set("guid", joystick->GetGUID());
params.Set("axis_x", analog_xaxis);
params.Set("axis_y", analog_yaxis);
analog_xaxis = -1;
analog_yaxis = -1;
analog_axes_joystick = -1;
return params;
}
return params;
}
private:
int analog_xaxis = -1;
int analog_yaxis = -1;
SDL_JoystickID analog_axes_joystick = -1;
};
} // namespace Polling
std::vector<std::unique_ptr<InputCommon::Polling::DevicePoller>> SDLState::GetPollers(
InputCommon::Polling::DeviceType type) {
std::vector<std::unique_ptr<InputCommon::Polling::DevicePoller>> pollers;
switch (type) {
case InputCommon::Polling::DeviceType::Analog:
pollers.emplace_back(std::make_unique<Polling::SDLAnalogPoller>(*this));
break;
case InputCommon::Polling::DeviceType::Button:
pollers.emplace_back(std::make_unique<Polling::SDLButtonPoller>(*this));
break;
return pollers;
}
}
} // namespace SDL
} // namespace InputCommon

View File

@@ -0,0 +1,64 @@
// Copyright 2018 Citra Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
#include <atomic>
#include <memory>
#include <thread>
#include "common/threadsafe_queue.h"
#include "input_common/sdl/sdl.h"
union SDL_Event;
using SDL_Joystick = struct _SDL_Joystick;
using SDL_JoystickID = s32;
namespace InputCommon::SDL {
class SDLJoystick;
class SDLButtonFactory;
class SDLAnalogFactory;
class SDLState : public State {
public:
/// Initializes and registers SDL device factories
SDLState();
/// Unresisters SDL device factories and shut them down.
~SDLState() override;
/// Handle SDL_Events for joysticks from SDL_PollEvent
void HandleGameControllerEvent(const SDL_Event& event);
std::shared_ptr<SDLJoystick> GetSDLJoystickBySDLID(SDL_JoystickID sdl_id);
std::shared_ptr<SDLJoystick> GetSDLJoystickByGUID(const std::string& guid, int port);
/// Get all DevicePoller that use the SDL backend for a specific device type
std::vector<std::unique_ptr<InputCommon::Polling::DevicePoller>> GetPollers(
InputCommon::Polling::DeviceType type) override;
/// Used by the Pollers during config
std::atomic<bool> polling = false;
Common::SPSCQueue<SDL_Event> event_queue;
private:
void InitJoystick(int joystick_index);
void CloseJoystick(SDL_Joystick* sdl_joystick);
/// Needs to be called before SDL_QuitSubSystem.
void CloseJoysticks();
/// Map of GUID of a list of corresponding virtual Joysticks
std::unordered_map<std::string, std::vector<std::shared_ptr<SDLJoystick>>> joystick_map;
std::mutex joystick_map_mutex;
std::shared_ptr<SDLButtonFactory> button_factory;
std::shared_ptr<SDLAnalogFactory> analog_factory;
bool start_thread = false;
std::atomic<bool> initialized = false;
std::thread poll_thread;
};
} // namespace InputCommon::SDL

View File

@@ -15,7 +15,7 @@ namespace ArmTests {
TestEnvironment::TestEnvironment(bool mutable_memory_)
: mutable_memory(mutable_memory_),
test_memory(std::make_shared<TestMemory>(this)), kernel{Core::System::GetInstance()} {
auto process = Kernel::Process::Create(kernel, "");
auto process = Kernel::Process::Create(Core::System::GetInstance(), "");
kernel.MakeCurrentProcess(process.get());
page_table = &process->VMManager().page_table;

View File

@@ -80,6 +80,7 @@ add_library(video_core STATIC
shader/decode/hfma2.cpp
shader/decode/conversion.cpp
shader/decode/memory.cpp
shader/decode/texture.cpp
shader/decode/float_set_predicate.cpp
shader/decode/integer_set_predicate.cpp
shader/decode/half_set_predicate.cpp
@@ -100,6 +101,8 @@ add_library(video_core STATIC
surface.h
textures/astc.cpp
textures/astc.h
textures/convert.cpp
textures/convert.h
textures/decoders.cpp
textures/decoders.h
textures/texture.h
@@ -110,6 +113,8 @@ add_library(video_core STATIC
if (ENABLE_VULKAN)
target_sources(video_core PRIVATE
renderer_vulkan/declarations.h
renderer_vulkan/maxwell_to_vk.cpp
renderer_vulkan/maxwell_to_vk.h
renderer_vulkan/vk_buffer_cache.cpp
renderer_vulkan/vk_buffer_cache.h
renderer_vulkan/vk_device.cpp
@@ -118,6 +123,8 @@ if (ENABLE_VULKAN)
renderer_vulkan/vk_memory_manager.h
renderer_vulkan/vk_resource_manager.cpp
renderer_vulkan/vk_resource_manager.h
renderer_vulkan/vk_sampler_cache.cpp
renderer_vulkan/vk_sampler_cache.h
renderer_vulkan/vk_scheduler.cpp
renderer_vulkan/vk_scheduler.h
renderer_vulkan/vk_stream_buffer.cpp

View File

@@ -39,7 +39,7 @@ bool DmaPusher::Step() {
}
const CommandList& command_list{dma_pushbuffer.front()};
const CommandListHeader& command_list_header{command_list[dma_pushbuffer_subindex++]};
const CommandListHeader command_list_header{command_list[dma_pushbuffer_subindex++]};
GPUVAddr dma_get = command_list_header.addr;
GPUVAddr dma_put = dma_get + command_list_header.size * sizeof(u32);
bool non_main = command_list_header.is_non_main;

View File

@@ -9,6 +9,7 @@
#include "video_core/engines/kepler_memory.h"
#include "video_core/engines/maxwell_3d.h"
#include "video_core/rasterizer_interface.h"
#include "video_core/renderer_base.h"
namespace Tegra::Engines {
@@ -48,7 +49,8 @@ void KeplerMemory::ProcessData(u32 data) {
// We have to invalidate the destination region to evict any outdated surfaces from the cache.
// We do this before actually writing the new data because the destination address might contain
// a dirty surface that will have to be written back to memory.
Core::System::GetInstance().GPU().InvalidateRegion(*dest_address, sizeof(u32));
system.Renderer().Rasterizer().InvalidateRegion(ToCacheAddr(Memory::GetPointer(*dest_address)),
sizeof(u32));
Memory::Write32(*dest_address, data);
system.GPU().Maxwell3D().dirty_flags.OnMemoryWrite();

View File

@@ -396,7 +396,10 @@ void Maxwell3D::ProcessCBData(u32 value) {
const auto address = memory_manager.GpuToCpuAddress(buffer_address + regs.const_buffer.cb_pos);
ASSERT_MSG(address, "Invalid GPU address");
Memory::Write32(*address, value);
u8* ptr{Memory::GetPointer(*address)};
rasterizer.InvalidateRegion(ToCacheAddr(ptr), sizeof(u32));
std::memcpy(ptr, &value, sizeof(u32));
dirty_flags.OnMemoryWrite();
// Increment the current buffer position.

View File

@@ -9,6 +9,7 @@
#include "video_core/engines/maxwell_3d.h"
#include "video_core/engines/maxwell_dma.h"
#include "video_core/rasterizer_interface.h"
#include "video_core/renderer_base.h"
#include "video_core/textures/decoders.h"
namespace Tegra::Engines {
@@ -92,12 +93,14 @@ void MaxwellDMA::HandleCopy() {
const auto FlushAndInvalidate = [&](u32 src_size, u64 dst_size) {
// TODO(Subv): For now, manually flush the regions until we implement GPU-accelerated
// copying.
Core::System::GetInstance().GPU().FlushRegion(*source_cpu, src_size);
Core::System::GetInstance().Renderer().Rasterizer().FlushRegion(
ToCacheAddr(Memory::GetPointer(*source_cpu)), src_size);
// We have to invalidate the destination region to evict any outdated surfaces from the
// cache. We do this before actually writing the new data because the destination address
// might contain a dirty surface that will have to be written back to memory.
Core::System::GetInstance().GPU().InvalidateRegion(*dest_cpu, dst_size);
Core::System::GetInstance().Renderer().Rasterizer().InvalidateRegion(
ToCacheAddr(Memory::GetPointer(*dest_cpu)), dst_size);
};
if (regs.exec.is_dst_linear && !regs.exec.is_src_linear) {

View File

@@ -324,11 +324,11 @@ enum class TextureQueryType : u64 {
enum class TextureProcessMode : u64 {
None = 0,
LZ = 1, // Unknown, appears to be the same as none.
LZ = 1, // Load LOD of zero.
LB = 2, // Load Bias.
LL = 3, // Load LOD (LevelOfDetail)
LBA = 6, // Load Bias. The A is unknown, does not appear to differ with LB
LLA = 7 // Load LOD. The A is unknown, does not appear to differ with LL
LL = 3, // Load LOD.
LBA = 6, // Load Bias. The A is unknown, does not appear to differ with LB.
LLA = 7 // Load LOD. The A is unknown, does not appear to differ with LL.
};
enum class TextureMiscMode : u64 {
@@ -1445,6 +1445,7 @@ public:
Flow,
Synch,
Memory,
Texture,
FloatSet,
FloatSetPredicate,
IntegerSet,
@@ -1575,14 +1576,14 @@ private:
INST("1110111101010---", Id::ST_L, Type::Memory, "ST_L"),
INST("1110111011010---", Id::LDG, Type::Memory, "LDG"),
INST("1110111011011---", Id::STG, Type::Memory, "STG"),
INST("110000----111---", Id::TEX, Type::Memory, "TEX"),
INST("1101111101001---", Id::TXQ, Type::Memory, "TXQ"),
INST("1101-00---------", Id::TEXS, Type::Memory, "TEXS"),
INST("1101101---------", Id::TLDS, Type::Memory, "TLDS"),
INST("110010----111---", Id::TLD4, Type::Memory, "TLD4"),
INST("1101111100------", Id::TLD4S, Type::Memory, "TLD4S"),
INST("110111110110----", Id::TMML_B, Type::Memory, "TMML_B"),
INST("1101111101011---", Id::TMML, Type::Memory, "TMML"),
INST("110000----111---", Id::TEX, Type::Texture, "TEX"),
INST("1101111101001---", Id::TXQ, Type::Texture, "TXQ"),
INST("1101-00---------", Id::TEXS, Type::Texture, "TEXS"),
INST("1101101---------", Id::TLDS, Type::Texture, "TLDS"),
INST("110010----111---", Id::TLD4, Type::Texture, "TLD4"),
INST("1101111100------", Id::TLD4S, Type::Texture, "TLD4S"),
INST("110111110110----", Id::TMML_B, Type::Texture, "TMML_B"),
INST("1101111101011---", Id::TMML, Type::Texture, "TMML"),
INST("111000110000----", Id::EXIT, Type::Trivial, "EXIT"),
INST("11100000--------", Id::IPA, Type::Trivial, "IPA"),
INST("1111101111100---", Id::OUT_R, Type::Trivial, "OUT_R"),

View File

@@ -11,6 +11,11 @@
#include "video_core/dma_pusher.h"
#include "video_core/memory_manager.h"
using CacheAddr = std::uintptr_t;
inline CacheAddr ToCacheAddr(const void* host_ptr) {
return reinterpret_cast<CacheAddr>(host_ptr);
}
namespace Core {
class System;
}
@@ -123,7 +128,7 @@ class GPU {
public:
explicit GPU(Core::System& system, VideoCore::RendererBase& renderer);
~GPU();
virtual ~GPU();
struct MethodCall {
u32 method{};
@@ -209,13 +214,13 @@ public:
std::optional<std::reference_wrapper<const Tegra::FramebufferConfig>> framebuffer) = 0;
/// Notify rasterizer that any caches of the specified region should be flushed to Switch memory
virtual void FlushRegion(VAddr addr, u64 size) = 0;
virtual void FlushRegion(CacheAddr addr, u64 size) = 0;
/// Notify rasterizer that any caches of the specified region should be invalidated
virtual void InvalidateRegion(VAddr addr, u64 size) = 0;
virtual void InvalidateRegion(CacheAddr addr, u64 size) = 0;
/// Notify rasterizer that any caches of the specified region should be flushed and invalidated
virtual void FlushAndInvalidateRegion(VAddr addr, u64 size) = 0;
virtual void FlushAndInvalidateRegion(CacheAddr addr, u64 size) = 0;
private:
void ProcessBindMethod(const MethodCall& method_call);

View File

@@ -22,15 +22,15 @@ void GPUAsynch::SwapBuffers(
gpu_thread.SwapBuffers(std::move(framebuffer));
}
void GPUAsynch::FlushRegion(VAddr addr, u64 size) {
void GPUAsynch::FlushRegion(CacheAddr addr, u64 size) {
gpu_thread.FlushRegion(addr, size);
}
void GPUAsynch::InvalidateRegion(VAddr addr, u64 size) {
void GPUAsynch::InvalidateRegion(CacheAddr addr, u64 size) {
gpu_thread.InvalidateRegion(addr, size);
}
void GPUAsynch::FlushAndInvalidateRegion(VAddr addr, u64 size) {
void GPUAsynch::FlushAndInvalidateRegion(CacheAddr addr, u64 size) {
gpu_thread.FlushAndInvalidateRegion(addr, size);
}

View File

@@ -21,14 +21,14 @@ class ThreadManager;
class GPUAsynch : public Tegra::GPU {
public:
explicit GPUAsynch(Core::System& system, VideoCore::RendererBase& renderer);
~GPUAsynch();
~GPUAsynch() override;
void PushGPUEntries(Tegra::CommandList&& entries) override;
void SwapBuffers(
std::optional<std::reference_wrapper<const Tegra::FramebufferConfig>> framebuffer) override;
void FlushRegion(VAddr addr, u64 size) override;
void InvalidateRegion(VAddr addr, u64 size) override;
void FlushAndInvalidateRegion(VAddr addr, u64 size) override;
void FlushRegion(CacheAddr addr, u64 size) override;
void InvalidateRegion(CacheAddr addr, u64 size) override;
void FlushAndInvalidateRegion(CacheAddr addr, u64 size) override;
private:
GPUThread::ThreadManager gpu_thread;

View File

@@ -22,15 +22,15 @@ void GPUSynch::SwapBuffers(
renderer.SwapBuffers(std::move(framebuffer));
}
void GPUSynch::FlushRegion(VAddr addr, u64 size) {
void GPUSynch::FlushRegion(CacheAddr addr, u64 size) {
renderer.Rasterizer().FlushRegion(addr, size);
}
void GPUSynch::InvalidateRegion(VAddr addr, u64 size) {
void GPUSynch::InvalidateRegion(CacheAddr addr, u64 size) {
renderer.Rasterizer().InvalidateRegion(addr, size);
}
void GPUSynch::FlushAndInvalidateRegion(VAddr addr, u64 size) {
void GPUSynch::FlushAndInvalidateRegion(CacheAddr addr, u64 size) {
renderer.Rasterizer().FlushAndInvalidateRegion(addr, size);
}

View File

@@ -16,14 +16,14 @@ namespace VideoCommon {
class GPUSynch : public Tegra::GPU {
public:
explicit GPUSynch(Core::System& system, VideoCore::RendererBase& renderer);
~GPUSynch();
~GPUSynch() override;
void PushGPUEntries(Tegra::CommandList&& entries) override;
void SwapBuffers(
std::optional<std::reference_wrapper<const Tegra::FramebufferConfig>> framebuffer) override;
void FlushRegion(VAddr addr, u64 size) override;
void InvalidateRegion(VAddr addr, u64 size) override;
void FlushAndInvalidateRegion(VAddr addr, u64 size) override;
void FlushRegion(CacheAddr addr, u64 size) override;
void InvalidateRegion(CacheAddr addr, u64 size) override;
void FlushAndInvalidateRegion(CacheAddr addr, u64 size) override;
};
} // namespace VideoCommon

View File

@@ -5,7 +5,6 @@
#include "common/assert.h"
#include "common/microprofile.h"
#include "core/frontend/scope_acquire_window_context.h"
#include "core/settings.h"
#include "video_core/dma_pusher.h"
#include "video_core/gpu.h"
#include "video_core/gpu_thread.h"
@@ -13,38 +12,13 @@
namespace VideoCommon::GPUThread {
/// Executes a single GPU thread command
static void ExecuteCommand(CommandData* command, VideoCore::RendererBase& renderer,
Tegra::DmaPusher& dma_pusher) {
if (const auto submit_list = std::get_if<SubmitListCommand>(command)) {
dma_pusher.Push(std::move(submit_list->entries));
dma_pusher.DispatchCalls();
} else if (const auto data = std::get_if<SwapBuffersCommand>(command)) {
renderer.SwapBuffers(data->framebuffer);
} else if (const auto data = std::get_if<FlushRegionCommand>(command)) {
renderer.Rasterizer().FlushRegion(data->addr, data->size);
} else if (const auto data = std::get_if<InvalidateRegionCommand>(command)) {
renderer.Rasterizer().InvalidateRegion(data->addr, data->size);
} else if (const auto data = std::get_if<FlushAndInvalidateRegionCommand>(command)) {
renderer.Rasterizer().FlushAndInvalidateRegion(data->addr, data->size);
} else {
UNREACHABLE();
}
}
/// Runs the GPU thread
static void RunThread(VideoCore::RendererBase& renderer, Tegra::DmaPusher& dma_pusher,
SynchState& state) {
MicroProfileOnThreadCreate("GpuThread");
auto WaitForWakeup = [&]() {
std::unique_lock<std::mutex> lock{state.signal_mutex};
state.signal_condition.wait(lock, [&] { return !state.is_idle || !state.is_running; });
};
// Wait for first GPU command before acquiring the window context
WaitForWakeup();
state.WaitForCommands();
// If emulation was stopped during disk shader loading, abort before trying to acquire context
if (!state.is_running) {
@@ -53,100 +27,72 @@ static void RunThread(VideoCore::RendererBase& renderer, Tegra::DmaPusher& dma_p
Core::Frontend::ScopeAcquireWindowContext acquire_context{renderer.GetRenderWindow()};
CommandDataContainer next;
while (state.is_running) {
if (!state.is_running) {
return;
state.WaitForCommands();
while (!state.queue.Empty()) {
state.queue.Pop(next);
if (const auto submit_list = std::get_if<SubmitListCommand>(&next.data)) {
dma_pusher.Push(std::move(submit_list->entries));
dma_pusher.DispatchCalls();
} else if (const auto data = std::get_if<SwapBuffersCommand>(&next.data)) {
state.DecrementFramesCounter();
renderer.SwapBuffers(std::move(data->framebuffer));
} else if (const auto data = std::get_if<FlushRegionCommand>(&next.data)) {
renderer.Rasterizer().FlushRegion(data->addr, data->size);
} else if (const auto data = std::get_if<InvalidateRegionCommand>(&next.data)) {
renderer.Rasterizer().InvalidateRegion(data->addr, data->size);
} else if (const auto data = std::get_if<EndProcessingCommand>(&next.data)) {
return;
} else {
UNREACHABLE();
}
}
{
// Thread has been woken up, so make the previous write queue the next read queue
std::lock_guard<std::mutex> lock{state.signal_mutex};
std::swap(state.push_queue, state.pop_queue);
}
// Execute all of the GPU commands
while (!state.pop_queue->empty()) {
ExecuteCommand(&state.pop_queue->front(), renderer, dma_pusher);
state.pop_queue->pop();
}
state.UpdateIdleState();
// Signal that the GPU thread has finished processing commands
if (state.is_idle) {
state.idle_condition.notify_one();
}
// Wait for CPU thread to send more GPU commands
WaitForWakeup();
}
}
ThreadManager::ThreadManager(VideoCore::RendererBase& renderer, Tegra::DmaPusher& dma_pusher)
: renderer{renderer}, dma_pusher{dma_pusher}, thread{RunThread, std::ref(renderer),
std::ref(dma_pusher), std::ref(state)},
thread_id{thread.get_id()} {}
std::ref(dma_pusher), std::ref(state)} {}
ThreadManager::~ThreadManager() {
{
// Notify GPU thread that a shutdown is pending
std::lock_guard<std::mutex> lock{state.signal_mutex};
state.is_running = false;
}
state.signal_condition.notify_one();
// Notify GPU thread that a shutdown is pending
PushCommand(EndProcessingCommand());
thread.join();
}
void ThreadManager::SubmitList(Tegra::CommandList&& entries) {
if (entries.empty()) {
return;
}
PushCommand(SubmitListCommand(std::move(entries)), false, false);
PushCommand(SubmitListCommand(std::move(entries)));
}
void ThreadManager::SwapBuffers(
std::optional<std::reference_wrapper<const Tegra::FramebufferConfig>> framebuffer) {
PushCommand(SwapBuffersCommand(std::move(framebuffer)), true, false);
state.IncrementFramesCounter();
PushCommand(SwapBuffersCommand(std::move(framebuffer)));
state.WaitForFrames();
}
void ThreadManager::FlushRegion(VAddr addr, u64 size) {
// Block the CPU when using accurate emulation
PushCommand(FlushRegionCommand(addr, size), Settings::values.use_accurate_gpu_emulation, false);
void ThreadManager::FlushRegion(CacheAddr addr, u64 size) {
PushCommand(FlushRegionCommand(addr, size));
}
void ThreadManager::InvalidateRegion(VAddr addr, u64 size) {
PushCommand(InvalidateRegionCommand(addr, size), true, true);
void ThreadManager::InvalidateRegion(CacheAddr addr, u64 size) {
if (state.queue.Empty()) {
// It's quicker to invalidate a single region on the CPU if the queue is already empty
renderer.Rasterizer().InvalidateRegion(addr, size);
} else {
PushCommand(InvalidateRegionCommand(addr, size));
}
}
void ThreadManager::FlushAndInvalidateRegion(VAddr addr, u64 size) {
void ThreadManager::FlushAndInvalidateRegion(CacheAddr addr, u64 size) {
// Skip flush on asynch mode, as FlushAndInvalidateRegion is not used for anything too important
InvalidateRegion(addr, size);
}
void ThreadManager::PushCommand(CommandData&& command_data, bool wait_for_idle, bool allow_on_cpu) {
{
std::lock_guard<std::mutex> lock{state.signal_mutex};
if ((allow_on_cpu && state.is_idle) || IsGpuThread()) {
// Execute the command synchronously on the current thread
ExecuteCommand(&command_data, renderer, dma_pusher);
return;
}
// Push the command to the GPU thread
state.UpdateIdleState();
state.push_queue->emplace(command_data);
}
// Signal the GPU thread that commands are pending
state.signal_condition.notify_one();
if (wait_for_idle) {
// Wait for the GPU to be idle (all commands to be executed)
std::unique_lock<std::mutex> lock{state.idle_mutex};
state.idle_condition.wait(lock, [this] { return static_cast<bool>(state.is_idle); });
}
void ThreadManager::PushCommand(CommandData&& command_data) {
state.queue.Push(CommandDataContainer(std::move(command_data)));
state.SignalCommands();
}
} // namespace VideoCommon::GPUThread

View File

@@ -13,6 +13,9 @@
#include <thread>
#include <variant>
#include "common/threadsafe_queue.h"
#include "video_core/gpu.h"
namespace Tegra {
struct FramebufferConfig;
class DmaPusher;
@@ -24,6 +27,9 @@ class RendererBase;
namespace VideoCommon::GPUThread {
/// Command to signal to the GPU thread that processing has ended
struct EndProcessingCommand final {};
/// Command to signal to the GPU thread that a command list is ready for processing
struct SubmitListCommand final {
explicit SubmitListCommand(Tegra::CommandList&& entries) : entries{std::move(entries)} {}
@@ -36,59 +42,110 @@ struct SwapBuffersCommand final {
explicit SwapBuffersCommand(std::optional<const Tegra::FramebufferConfig> framebuffer)
: framebuffer{std::move(framebuffer)} {}
std::optional<const Tegra::FramebufferConfig> framebuffer;
std::optional<Tegra::FramebufferConfig> framebuffer;
};
/// Command to signal to the GPU thread to flush a region
struct FlushRegionCommand final {
explicit constexpr FlushRegionCommand(VAddr addr, u64 size) : addr{addr}, size{size} {}
explicit constexpr FlushRegionCommand(CacheAddr addr, u64 size) : addr{addr}, size{size} {}
const VAddr addr;
const u64 size;
CacheAddr addr;
u64 size;
};
/// Command to signal to the GPU thread to invalidate a region
struct InvalidateRegionCommand final {
explicit constexpr InvalidateRegionCommand(VAddr addr, u64 size) : addr{addr}, size{size} {}
explicit constexpr InvalidateRegionCommand(CacheAddr addr, u64 size) : addr{addr}, size{size} {}
const VAddr addr;
const u64 size;
CacheAddr addr;
u64 size;
};
/// Command to signal to the GPU thread to flush and invalidate a region
struct FlushAndInvalidateRegionCommand final {
explicit constexpr FlushAndInvalidateRegionCommand(VAddr addr, u64 size)
explicit constexpr FlushAndInvalidateRegionCommand(CacheAddr addr, u64 size)
: addr{addr}, size{size} {}
const VAddr addr;
const u64 size;
CacheAddr addr;
u64 size;
};
using CommandData = std::variant<SubmitListCommand, SwapBuffersCommand, FlushRegionCommand,
InvalidateRegionCommand, FlushAndInvalidateRegionCommand>;
using CommandData =
std::variant<EndProcessingCommand, SubmitListCommand, SwapBuffersCommand, FlushRegionCommand,
InvalidateRegionCommand, FlushAndInvalidateRegionCommand>;
struct CommandDataContainer {
CommandDataContainer() = default;
CommandDataContainer(CommandData&& data) : data{std::move(data)} {}
CommandDataContainer& operator=(const CommandDataContainer& t) {
data = std::move(t.data);
return *this;
}
CommandData data;
};
/// Struct used to synchronize the GPU thread
struct SynchState final {
std::atomic<bool> is_running{true};
std::atomic<bool> is_idle{true};
std::condition_variable signal_condition;
std::mutex signal_mutex;
std::condition_variable idle_condition;
std::mutex idle_mutex;
std::atomic_bool is_running{true};
std::atomic_int queued_frame_count{};
std::mutex frames_mutex;
std::mutex commands_mutex;
std::condition_variable commands_condition;
std::condition_variable frames_condition;
// We use two queues for sending commands to the GPU thread, one for writing (push_queue) to and
// one for reading from (pop_queue). These are swapped whenever the current pop_queue becomes
// empty. This allows for efficient thread-safe access, as it does not require any copies.
using CommandQueue = std::queue<CommandData>;
std::array<CommandQueue, 2> command_queues;
CommandQueue* push_queue{&command_queues[0]};
CommandQueue* pop_queue{&command_queues[1]};
void UpdateIdleState() {
std::lock_guard<std::mutex> lock{idle_mutex};
is_idle = command_queues[0].empty() && command_queues[1].empty();
void IncrementFramesCounter() {
std::lock_guard<std::mutex> lock{frames_mutex};
++queued_frame_count;
}
void DecrementFramesCounter() {
{
std::lock_guard<std::mutex> lock{frames_mutex};
--queued_frame_count;
if (queued_frame_count) {
return;
}
}
frames_condition.notify_one();
}
void WaitForFrames() {
{
std::lock_guard<std::mutex> lock{frames_mutex};
if (!queued_frame_count) {
return;
}
}
// Wait for the GPU to be idle (all commands to be executed)
{
std::unique_lock<std::mutex> lock{frames_mutex};
frames_condition.wait(lock, [this] { return !queued_frame_count; });
}
}
void SignalCommands() {
{
std::unique_lock<std::mutex> lock{commands_mutex};
if (queue.Empty()) {
return;
}
}
commands_condition.notify_one();
}
void WaitForCommands() {
std::unique_lock<std::mutex> lock{commands_mutex};
commands_condition.wait(lock, [this] { return !queue.Empty(); });
}
using CommandQueue = Common::SPSCQueue<CommandDataContainer>;
CommandQueue queue;
};
/// Class used to manage the GPU thread
@@ -105,32 +162,24 @@ public:
std::optional<std::reference_wrapper<const Tegra::FramebufferConfig>> framebuffer);
/// Notify rasterizer that any caches of the specified region should be flushed to Switch memory
void FlushRegion(VAddr addr, u64 size);
void FlushRegion(CacheAddr addr, u64 size);
/// Notify rasterizer that any caches of the specified region should be invalidated
void InvalidateRegion(VAddr addr, u64 size);
void InvalidateRegion(CacheAddr addr, u64 size);
/// Notify rasterizer that any caches of the specified region should be flushed and invalidated
void FlushAndInvalidateRegion(VAddr addr, u64 size);
/// Waits the caller until the GPU thread is idle, used for synchronization
void WaitForIdle();
void FlushAndInvalidateRegion(CacheAddr addr, u64 size);
private:
/// Pushes a command to be executed by the GPU thread
void PushCommand(CommandData&& command_data, bool wait_for_idle, bool allow_on_cpu);
/// Returns true if this is called by the GPU thread
bool IsGpuThread() const {
return std::this_thread::get_id() == thread_id;
}
void PushCommand(CommandData&& command_data);
private:
SynchState state;
std::thread thread;
std::thread::id thread_id;
VideoCore::RendererBase& renderer;
Tegra::DmaPusher& dma_pusher;
std::thread thread;
std::thread::id thread_id;
};
} // namespace VideoCommon::GPUThread

View File

@@ -16,12 +16,12 @@ namespace VideoCore {
using Surface::GetBytesPerPixel;
using Surface::PixelFormat;
using MortonCopyFn = void (*)(u32, u32, u32, u32, u32, u32, u8*, std::size_t, VAddr);
using MortonCopyFn = void (*)(u32, u32, u32, u32, u32, u32, u8*, VAddr);
using ConversionArray = std::array<MortonCopyFn, Surface::MaxPixelFormat>;
template <bool morton_to_linear, PixelFormat format>
static void MortonCopy(u32 stride, u32 block_height, u32 height, u32 block_depth, u32 depth,
u32 tile_width_spacing, u8* buffer, std::size_t buffer_size, VAddr addr) {
u32 tile_width_spacing, u8* buffer, VAddr addr) {
constexpr u32 bytes_per_pixel = GetBytesPerPixel(format);
// With the BCn formats (DXT and DXN), each 4x4 tile is swizzled instead of just individual
@@ -42,142 +42,138 @@ static void MortonCopy(u32 stride, u32 block_height, u32 height, u32 block_depth
}
static constexpr ConversionArray morton_to_linear_fns = {
// clang-format off
MortonCopy<true, PixelFormat::ABGR8U>,
MortonCopy<true, PixelFormat::ABGR8S>,
MortonCopy<true, PixelFormat::ABGR8UI>,
MortonCopy<true, PixelFormat::B5G6R5U>,
MortonCopy<true, PixelFormat::A2B10G10R10U>,
MortonCopy<true, PixelFormat::A1B5G5R5U>,
MortonCopy<true, PixelFormat::R8U>,
MortonCopy<true, PixelFormat::R8UI>,
MortonCopy<true, PixelFormat::RGBA16F>,
MortonCopy<true, PixelFormat::RGBA16U>,
MortonCopy<true, PixelFormat::RGBA16UI>,
MortonCopy<true, PixelFormat::R11FG11FB10F>,
MortonCopy<true, PixelFormat::RGBA32UI>,
MortonCopy<true, PixelFormat::DXT1>,
MortonCopy<true, PixelFormat::DXT23>,
MortonCopy<true, PixelFormat::DXT45>,
MortonCopy<true, PixelFormat::DXN1>,
MortonCopy<true, PixelFormat::DXN2UNORM>,
MortonCopy<true, PixelFormat::DXN2SNORM>,
MortonCopy<true, PixelFormat::BC7U>,
MortonCopy<true, PixelFormat::BC6H_UF16>,
MortonCopy<true, PixelFormat::BC6H_SF16>,
MortonCopy<true, PixelFormat::ASTC_2D_4X4>,
MortonCopy<true, PixelFormat::BGRA8>,
MortonCopy<true, PixelFormat::RGBA32F>,
MortonCopy<true, PixelFormat::RG32F>,
MortonCopy<true, PixelFormat::R32F>,
MortonCopy<true, PixelFormat::R16F>,
MortonCopy<true, PixelFormat::R16U>,
MortonCopy<true, PixelFormat::R16S>,
MortonCopy<true, PixelFormat::R16UI>,
MortonCopy<true, PixelFormat::R16I>,
MortonCopy<true, PixelFormat::RG16>,
MortonCopy<true, PixelFormat::RG16F>,
MortonCopy<true, PixelFormat::RG16UI>,
MortonCopy<true, PixelFormat::RG16I>,
MortonCopy<true, PixelFormat::RG16S>,
MortonCopy<true, PixelFormat::RGB32F>,
MortonCopy<true, PixelFormat::RGBA8_SRGB>,
MortonCopy<true, PixelFormat::RG8U>,
MortonCopy<true, PixelFormat::RG8S>,
MortonCopy<true, PixelFormat::RG32UI>,
MortonCopy<true, PixelFormat::R32UI>,
MortonCopy<true, PixelFormat::ASTC_2D_8X8>,
MortonCopy<true, PixelFormat::ASTC_2D_8X5>,
MortonCopy<true, PixelFormat::ASTC_2D_5X4>,
MortonCopy<true, PixelFormat::BGRA8_SRGB>,
MortonCopy<true, PixelFormat::DXT1_SRGB>,
MortonCopy<true, PixelFormat::DXT23_SRGB>,
MortonCopy<true, PixelFormat::DXT45_SRGB>,
MortonCopy<true, PixelFormat::BC7U_SRGB>,
MortonCopy<true, PixelFormat::ASTC_2D_4X4_SRGB>,
MortonCopy<true, PixelFormat::ASTC_2D_8X8_SRGB>,
MortonCopy<true, PixelFormat::ASTC_2D_8X5_SRGB>,
MortonCopy<true, PixelFormat::ASTC_2D_5X4_SRGB>,
MortonCopy<true, PixelFormat::ASTC_2D_5X5>,
MortonCopy<true, PixelFormat::ASTC_2D_5X5_SRGB>,
MortonCopy<true, PixelFormat::ASTC_2D_10X8>,
MortonCopy<true, PixelFormat::ASTC_2D_10X8_SRGB>,
MortonCopy<true, PixelFormat::Z32F>,
MortonCopy<true, PixelFormat::Z16>,
MortonCopy<true, PixelFormat::Z24S8>,
MortonCopy<true, PixelFormat::S8Z24>,
MortonCopy<true, PixelFormat::Z32FS8>,
// clang-format on
MortonCopy<true, PixelFormat::ABGR8U>,
MortonCopy<true, PixelFormat::ABGR8S>,
MortonCopy<true, PixelFormat::ABGR8UI>,
MortonCopy<true, PixelFormat::B5G6R5U>,
MortonCopy<true, PixelFormat::A2B10G10R10U>,
MortonCopy<true, PixelFormat::A1B5G5R5U>,
MortonCopy<true, PixelFormat::R8U>,
MortonCopy<true, PixelFormat::R8UI>,
MortonCopy<true, PixelFormat::RGBA16F>,
MortonCopy<true, PixelFormat::RGBA16U>,
MortonCopy<true, PixelFormat::RGBA16UI>,
MortonCopy<true, PixelFormat::R11FG11FB10F>,
MortonCopy<true, PixelFormat::RGBA32UI>,
MortonCopy<true, PixelFormat::DXT1>,
MortonCopy<true, PixelFormat::DXT23>,
MortonCopy<true, PixelFormat::DXT45>,
MortonCopy<true, PixelFormat::DXN1>,
MortonCopy<true, PixelFormat::DXN2UNORM>,
MortonCopy<true, PixelFormat::DXN2SNORM>,
MortonCopy<true, PixelFormat::BC7U>,
MortonCopy<true, PixelFormat::BC6H_UF16>,
MortonCopy<true, PixelFormat::BC6H_SF16>,
MortonCopy<true, PixelFormat::ASTC_2D_4X4>,
MortonCopy<true, PixelFormat::BGRA8>,
MortonCopy<true, PixelFormat::RGBA32F>,
MortonCopy<true, PixelFormat::RG32F>,
MortonCopy<true, PixelFormat::R32F>,
MortonCopy<true, PixelFormat::R16F>,
MortonCopy<true, PixelFormat::R16U>,
MortonCopy<true, PixelFormat::R16S>,
MortonCopy<true, PixelFormat::R16UI>,
MortonCopy<true, PixelFormat::R16I>,
MortonCopy<true, PixelFormat::RG16>,
MortonCopy<true, PixelFormat::RG16F>,
MortonCopy<true, PixelFormat::RG16UI>,
MortonCopy<true, PixelFormat::RG16I>,
MortonCopy<true, PixelFormat::RG16S>,
MortonCopy<true, PixelFormat::RGB32F>,
MortonCopy<true, PixelFormat::RGBA8_SRGB>,
MortonCopy<true, PixelFormat::RG8U>,
MortonCopy<true, PixelFormat::RG8S>,
MortonCopy<true, PixelFormat::RG32UI>,
MortonCopy<true, PixelFormat::R32UI>,
MortonCopy<true, PixelFormat::ASTC_2D_8X8>,
MortonCopy<true, PixelFormat::ASTC_2D_8X5>,
MortonCopy<true, PixelFormat::ASTC_2D_5X4>,
MortonCopy<true, PixelFormat::BGRA8_SRGB>,
MortonCopy<true, PixelFormat::DXT1_SRGB>,
MortonCopy<true, PixelFormat::DXT23_SRGB>,
MortonCopy<true, PixelFormat::DXT45_SRGB>,
MortonCopy<true, PixelFormat::BC7U_SRGB>,
MortonCopy<true, PixelFormat::ASTC_2D_4X4_SRGB>,
MortonCopy<true, PixelFormat::ASTC_2D_8X8_SRGB>,
MortonCopy<true, PixelFormat::ASTC_2D_8X5_SRGB>,
MortonCopy<true, PixelFormat::ASTC_2D_5X4_SRGB>,
MortonCopy<true, PixelFormat::ASTC_2D_5X5>,
MortonCopy<true, PixelFormat::ASTC_2D_5X5_SRGB>,
MortonCopy<true, PixelFormat::ASTC_2D_10X8>,
MortonCopy<true, PixelFormat::ASTC_2D_10X8_SRGB>,
MortonCopy<true, PixelFormat::Z32F>,
MortonCopy<true, PixelFormat::Z16>,
MortonCopy<true, PixelFormat::Z24S8>,
MortonCopy<true, PixelFormat::S8Z24>,
MortonCopy<true, PixelFormat::Z32FS8>,
};
static constexpr ConversionArray linear_to_morton_fns = {
// clang-format off
MortonCopy<false, PixelFormat::ABGR8U>,
MortonCopy<false, PixelFormat::ABGR8S>,
MortonCopy<false, PixelFormat::ABGR8UI>,
MortonCopy<false, PixelFormat::B5G6R5U>,
MortonCopy<false, PixelFormat::A2B10G10R10U>,
MortonCopy<false, PixelFormat::A1B5G5R5U>,
MortonCopy<false, PixelFormat::R8U>,
MortonCopy<false, PixelFormat::R8UI>,
MortonCopy<false, PixelFormat::RGBA16F>,
MortonCopy<false, PixelFormat::RGBA16U>,
MortonCopy<false, PixelFormat::RGBA16UI>,
MortonCopy<false, PixelFormat::R11FG11FB10F>,
MortonCopy<false, PixelFormat::RGBA32UI>,
MortonCopy<false, PixelFormat::DXT1>,
MortonCopy<false, PixelFormat::DXT23>,
MortonCopy<false, PixelFormat::DXT45>,
MortonCopy<false, PixelFormat::DXN1>,
MortonCopy<false, PixelFormat::DXN2UNORM>,
MortonCopy<false, PixelFormat::DXN2SNORM>,
MortonCopy<false, PixelFormat::BC7U>,
MortonCopy<false, PixelFormat::BC6H_UF16>,
MortonCopy<false, PixelFormat::BC6H_SF16>,
// TODO(Subv): Swizzling ASTC formats are not supported
nullptr,
MortonCopy<false, PixelFormat::BGRA8>,
MortonCopy<false, PixelFormat::RGBA32F>,
MortonCopy<false, PixelFormat::RG32F>,
MortonCopy<false, PixelFormat::R32F>,
MortonCopy<false, PixelFormat::R16F>,
MortonCopy<false, PixelFormat::R16U>,
MortonCopy<false, PixelFormat::R16S>,
MortonCopy<false, PixelFormat::R16UI>,
MortonCopy<false, PixelFormat::R16I>,
MortonCopy<false, PixelFormat::RG16>,
MortonCopy<false, PixelFormat::RG16F>,
MortonCopy<false, PixelFormat::RG16UI>,
MortonCopy<false, PixelFormat::RG16I>,
MortonCopy<false, PixelFormat::RG16S>,
MortonCopy<false, PixelFormat::RGB32F>,
MortonCopy<false, PixelFormat::RGBA8_SRGB>,
MortonCopy<false, PixelFormat::RG8U>,
MortonCopy<false, PixelFormat::RG8S>,
MortonCopy<false, PixelFormat::RG32UI>,
MortonCopy<false, PixelFormat::R32UI>,
nullptr,
nullptr,
nullptr,
MortonCopy<false, PixelFormat::BGRA8_SRGB>,
MortonCopy<false, PixelFormat::DXT1_SRGB>,
MortonCopy<false, PixelFormat::DXT23_SRGB>,
MortonCopy<false, PixelFormat::DXT45_SRGB>,
MortonCopy<false, PixelFormat::BC7U_SRGB>,
nullptr,
nullptr,
nullptr,
nullptr,
nullptr,
nullptr,
nullptr,
nullptr,
MortonCopy<false, PixelFormat::Z32F>,
MortonCopy<false, PixelFormat::Z16>,
MortonCopy<false, PixelFormat::Z24S8>,
MortonCopy<false, PixelFormat::S8Z24>,
MortonCopy<false, PixelFormat::Z32FS8>,
// clang-format on
MortonCopy<false, PixelFormat::ABGR8U>,
MortonCopy<false, PixelFormat::ABGR8S>,
MortonCopy<false, PixelFormat::ABGR8UI>,
MortonCopy<false, PixelFormat::B5G6R5U>,
MortonCopy<false, PixelFormat::A2B10G10R10U>,
MortonCopy<false, PixelFormat::A1B5G5R5U>,
MortonCopy<false, PixelFormat::R8U>,
MortonCopy<false, PixelFormat::R8UI>,
MortonCopy<false, PixelFormat::RGBA16F>,
MortonCopy<false, PixelFormat::RGBA16U>,
MortonCopy<false, PixelFormat::RGBA16UI>,
MortonCopy<false, PixelFormat::R11FG11FB10F>,
MortonCopy<false, PixelFormat::RGBA32UI>,
MortonCopy<false, PixelFormat::DXT1>,
MortonCopy<false, PixelFormat::DXT23>,
MortonCopy<false, PixelFormat::DXT45>,
MortonCopy<false, PixelFormat::DXN1>,
MortonCopy<false, PixelFormat::DXN2UNORM>,
MortonCopy<false, PixelFormat::DXN2SNORM>,
MortonCopy<false, PixelFormat::BC7U>,
MortonCopy<false, PixelFormat::BC6H_UF16>,
MortonCopy<false, PixelFormat::BC6H_SF16>,
// TODO(Subv): Swizzling ASTC formats are not supported
nullptr,
MortonCopy<false, PixelFormat::BGRA8>,
MortonCopy<false, PixelFormat::RGBA32F>,
MortonCopy<false, PixelFormat::RG32F>,
MortonCopy<false, PixelFormat::R32F>,
MortonCopy<false, PixelFormat::R16F>,
MortonCopy<false, PixelFormat::R16U>,
MortonCopy<false, PixelFormat::R16S>,
MortonCopy<false, PixelFormat::R16UI>,
MortonCopy<false, PixelFormat::R16I>,
MortonCopy<false, PixelFormat::RG16>,
MortonCopy<false, PixelFormat::RG16F>,
MortonCopy<false, PixelFormat::RG16UI>,
MortonCopy<false, PixelFormat::RG16I>,
MortonCopy<false, PixelFormat::RG16S>,
MortonCopy<false, PixelFormat::RGB32F>,
MortonCopy<false, PixelFormat::RGBA8_SRGB>,
MortonCopy<false, PixelFormat::RG8U>,
MortonCopy<false, PixelFormat::RG8S>,
MortonCopy<false, PixelFormat::RG32UI>,
MortonCopy<false, PixelFormat::R32UI>,
nullptr,
nullptr,
nullptr,
MortonCopy<false, PixelFormat::BGRA8_SRGB>,
MortonCopy<false, PixelFormat::DXT1_SRGB>,
MortonCopy<false, PixelFormat::DXT23_SRGB>,
MortonCopy<false, PixelFormat::DXT45_SRGB>,
MortonCopy<false, PixelFormat::BC7U_SRGB>,
nullptr,
nullptr,
nullptr,
nullptr,
nullptr,
nullptr,
nullptr,
nullptr,
MortonCopy<false, PixelFormat::Z32F>,
MortonCopy<false, PixelFormat::Z16>,
MortonCopy<false, PixelFormat::Z24S8>,
MortonCopy<false, PixelFormat::S8Z24>,
MortonCopy<false, PixelFormat::Z32FS8>,
};
static MortonCopyFn GetSwizzleFunction(MortonSwizzleMode mode, Surface::PixelFormat format) {
@@ -191,45 +187,6 @@ static MortonCopyFn GetSwizzleFunction(MortonSwizzleMode mode, Surface::PixelFor
return morton_to_linear_fns[static_cast<std::size_t>(format)];
}
/// 8x8 Z-Order coordinate from 2D coordinates
static u32 MortonInterleave(u32 x, u32 y) {
static const u32 xlut[] = {0x00, 0x01, 0x04, 0x05, 0x10, 0x11, 0x14, 0x15};
static const u32 ylut[] = {0x00, 0x02, 0x08, 0x0a, 0x20, 0x22, 0x28, 0x2a};
return xlut[x % 8] + ylut[y % 8];
}
/// Calculates the offset of the position of the pixel in Morton order
static u32 GetMortonOffset(u32 x, u32 y, u32 bytes_per_pixel) {
// Images are split into 8x8 tiles. Each tile is composed of four 4x4 subtiles each
// of which is composed of four 2x2 subtiles each of which is composed of four texels.
// Each structure is embedded into the next-bigger one in a diagonal pattern, e.g.
// texels are laid out in a 2x2 subtile like this:
// 2 3
// 0 1
//
// The full 8x8 tile has the texels arranged like this:
//
// 42 43 46 47 58 59 62 63
// 40 41 44 45 56 57 60 61
// 34 35 38 39 50 51 54 55
// 32 33 36 37 48 49 52 53
// 10 11 14 15 26 27 30 31
// 08 09 12 13 24 25 28 29
// 02 03 06 07 18 19 22 23
// 00 01 04 05 16 17 20 21
//
// This pattern is what's called Z-order curve, or Morton order.
const unsigned int block_height = 8;
const unsigned int coarse_x = x & ~7;
u32 i = MortonInterleave(x, y);
const unsigned int offset = coarse_x * block_height;
return (i + offset) * bytes_per_pixel;
}
static u32 MortonInterleave128(u32 x, u32 y) {
// 128x128 Z-Order coordinate from 2D coordinates
static constexpr u32 xlut[] = {
@@ -325,14 +282,14 @@ static u32 GetMortonOffset128(u32 x, u32 y, u32 bytes_per_pixel) {
void MortonSwizzle(MortonSwizzleMode mode, Surface::PixelFormat format, u32 stride,
u32 block_height, u32 height, u32 block_depth, u32 depth, u32 tile_width_spacing,
u8* buffer, std::size_t buffer_size, VAddr addr) {
u8* buffer, VAddr addr) {
GetSwizzleFunction(mode, format)(stride, block_height, height, block_depth, depth,
tile_width_spacing, buffer, buffer_size, addr);
tile_width_spacing, buffer, addr);
}
void MortonCopyPixels128(u32 width, u32 height, u32 bytes_per_pixel, u32 linear_bytes_per_pixel,
u8* morton_data, u8* linear_data, bool morton_to_linear) {
void MortonCopyPixels128(MortonSwizzleMode mode, u32 width, u32 height, u32 bytes_per_pixel,
u32 linear_bytes_per_pixel, u8* morton_data, u8* linear_data) {
const bool morton_to_linear = mode == MortonSwizzleMode::MortonToLinear;
u8* data_ptrs[2];
for (u32 y = 0; y < height; ++y) {
for (u32 x = 0; x < width; ++x) {

View File

@@ -13,9 +13,9 @@ enum class MortonSwizzleMode { MortonToLinear, LinearToMorton };
void MortonSwizzle(MortonSwizzleMode mode, VideoCore::Surface::PixelFormat format, u32 stride,
u32 block_height, u32 height, u32 block_depth, u32 depth, u32 tile_width_spacing,
u8* buffer, std::size_t buffer_size, VAddr addr);
u8* buffer, VAddr addr);
void MortonCopyPixels128(u32 width, u32 height, u32 bytes_per_pixel, u32 linear_bytes_per_pixel,
u8* morton_data, u8* linear_data, bool morton_to_linear);
void MortonCopyPixels128(MortonSwizzleMode mode, u32 width, u32 height, u32 bytes_per_pixel,
u32 linear_bytes_per_pixel, u8* morton_data, u8* linear_data);
} // namespace VideoCore

View File

@@ -4,6 +4,7 @@
#pragma once
#include <mutex>
#include <set>
#include <unordered_map>
@@ -12,14 +13,26 @@
#include "common/common_types.h"
#include "core/settings.h"
#include "video_core/gpu.h"
#include "video_core/rasterizer_interface.h"
class RasterizerCacheObject {
public:
explicit RasterizerCacheObject(const u8* host_ptr)
: host_ptr{host_ptr}, cache_addr{ToCacheAddr(host_ptr)} {}
virtual ~RasterizerCacheObject();
CacheAddr GetCacheAddr() const {
return cache_addr;
}
const u8* GetHostPtr() const {
return host_ptr;
}
/// Gets the address of the shader in guest memory, required for cache management
virtual VAddr GetAddr() const = 0;
virtual VAddr GetCpuAddr() const = 0;
/// Gets the size of the shader in guest memory, required for cache management
virtual std::size_t GetSizeInBytes() const = 0;
@@ -58,6 +71,8 @@ private:
bool is_registered{}; ///< Whether the object is currently registered with the cache
bool is_dirty{}; ///< Whether the object is dirty (out of sync with guest memory)
u64 last_modified_ticks{}; ///< When the object was last modified, used for in-order flushing
CacheAddr cache_addr{}; ///< Cache address memory, unique from emulated virtual address space
const u8* host_ptr{}; ///< Pointer to the memory backing this cached region
};
template <class T>
@@ -68,7 +83,9 @@ public:
explicit RasterizerCache(VideoCore::RasterizerInterface& rasterizer) : rasterizer{rasterizer} {}
/// Write any cached resources overlapping the specified region back to memory
void FlushRegion(Tegra::GPUVAddr addr, size_t size) {
void FlushRegion(CacheAddr addr, std::size_t size) {
std::lock_guard<std::recursive_mutex> lock{mutex};
const auto& objects{GetSortedObjectsFromRegion(addr, size)};
for (auto& object : objects) {
FlushObject(object);
@@ -76,7 +93,9 @@ public:
}
/// Mark the specified region as being invalidated
void InvalidateRegion(VAddr addr, u64 size) {
void InvalidateRegion(CacheAddr addr, u64 size) {
std::lock_guard<std::recursive_mutex> lock{mutex};
const auto& objects{GetSortedObjectsFromRegion(addr, size)};
for (auto& object : objects) {
if (!object->IsRegistered()) {
@@ -89,48 +108,60 @@ public:
/// Invalidates everything in the cache
void InvalidateAll() {
std::lock_guard<std::recursive_mutex> lock{mutex};
while (interval_cache.begin() != interval_cache.end()) {
Unregister(*interval_cache.begin()->second.begin());
}
}
protected:
/// Tries to get an object from the cache with the specified address
T TryGet(VAddr addr) const {
/// Tries to get an object from the cache with the specified cache address
T TryGet(CacheAddr addr) const {
const auto iter = map_cache.find(addr);
if (iter != map_cache.end())
return iter->second;
return nullptr;
}
T TryGet(const void* addr) const {
const auto iter = map_cache.find(ToCacheAddr(addr));
if (iter != map_cache.end())
return iter->second;
return nullptr;
}
/// Register an object into the cache
void Register(const T& object) {
std::lock_guard<std::recursive_mutex> lock{mutex};
object->SetIsRegistered(true);
interval_cache.add({GetInterval(object), ObjectSet{object}});
map_cache.insert({object->GetAddr(), object});
rasterizer.UpdatePagesCachedCount(object->GetAddr(), object->GetSizeInBytes(), 1);
map_cache.insert({object->GetCacheAddr(), object});
rasterizer.UpdatePagesCachedCount(object->GetCpuAddr(), object->GetSizeInBytes(), 1);
}
/// Unregisters an object from the cache
void Unregister(const T& object) {
object->SetIsRegistered(false);
rasterizer.UpdatePagesCachedCount(object->GetAddr(), object->GetSizeInBytes(), -1);
// Only flush if use_accurate_gpu_emulation is enabled, as it incurs a performance hit
if (Settings::values.use_accurate_gpu_emulation) {
FlushObject(object);
}
std::lock_guard<std::recursive_mutex> lock{mutex};
object->SetIsRegistered(false);
rasterizer.UpdatePagesCachedCount(object->GetCpuAddr(), object->GetSizeInBytes(), -1);
interval_cache.subtract({GetInterval(object), ObjectSet{object}});
map_cache.erase(object->GetAddr());
map_cache.erase(object->GetCacheAddr());
}
/// Returns a ticks counter used for tracking when cached objects were last modified
u64 GetModifiedTicks() {
std::lock_guard<std::recursive_mutex> lock{mutex};
return ++modified_ticks;
}
/// Flushes the specified object, updating appropriate cache state as needed
void FlushObject(const T& object) {
std::lock_guard<std::recursive_mutex> lock{mutex};
if (!object->IsDirty()) {
return;
}
@@ -140,7 +171,7 @@ protected:
private:
/// Returns a list of cached objects from the specified memory region, ordered by access time
std::vector<T> GetSortedObjectsFromRegion(VAddr addr, u64 size) {
std::vector<T> GetSortedObjectsFromRegion(CacheAddr addr, u64 size) {
if (size == 0) {
return {};
}
@@ -164,17 +195,18 @@ private:
}
using ObjectSet = std::set<T>;
using ObjectCache = std::unordered_map<VAddr, T>;
using IntervalCache = boost::icl::interval_map<VAddr, ObjectSet>;
using ObjectCache = std::unordered_map<CacheAddr, T>;
using IntervalCache = boost::icl::interval_map<CacheAddr, ObjectSet>;
using ObjectInterval = typename IntervalCache::interval_type;
static auto GetInterval(const T& object) {
return ObjectInterval::right_open(object->GetAddr(),
object->GetAddr() + object->GetSizeInBytes());
return ObjectInterval::right_open(object->GetCacheAddr(),
object->GetCacheAddr() + object->GetSizeInBytes());
}
ObjectCache map_cache;
IntervalCache interval_cache; ///< Cache of objects
u64 modified_ticks{}; ///< Counter of cache state ticks, used for in-order flushing
VideoCore::RasterizerInterface& rasterizer;
std::recursive_mutex mutex;
};

View File

@@ -35,14 +35,14 @@ public:
virtual void FlushAll() = 0;
/// Notify rasterizer that any caches of the specified region should be flushed to Switch memory
virtual void FlushRegion(VAddr addr, u64 size) = 0;
virtual void FlushRegion(CacheAddr addr, u64 size) = 0;
/// Notify rasterizer that any caches of the specified region should be invalidated
virtual void InvalidateRegion(VAddr addr, u64 size) = 0;
virtual void InvalidateRegion(CacheAddr addr, u64 size) = 0;
/// Notify rasterizer that any caches of the specified region should be flushed to Switch memory
/// and invalidated
virtual void FlushAndInvalidateRegion(VAddr addr, u64 size) = 0;
virtual void FlushAndInvalidateRegion(CacheAddr addr, u64 size) = 0;
/// Attempt to use a faster method to perform a surface copy
virtual bool AccelerateSurfaceCopy(const Tegra::Engines::Fermi2D::Regs::Surface& src,
@@ -63,7 +63,7 @@ public:
}
/// Increase/decrease the number of object in pages touching the specified region
virtual void UpdatePagesCachedCount(Tegra::GPUVAddr addr, u64 size, int delta) {}
virtual void UpdatePagesCachedCount(VAddr addr, u64 size, int delta) {}
/// Initialize disk cached resources for the game being emulated
virtual void LoadDiskResources(const std::atomic_bool& stop_loading = false,

View File

@@ -13,6 +13,11 @@
namespace OpenGL {
CachedBufferEntry::CachedBufferEntry(VAddr cpu_addr, std::size_t size, GLintptr offset,
std::size_t alignment, u8* host_ptr)
: cpu_addr{cpu_addr}, size{size}, offset{offset}, alignment{alignment}, RasterizerCacheObject{
host_ptr} {}
OGLBufferCache::OGLBufferCache(RasterizerOpenGL& rasterizer, std::size_t size)
: RasterizerCache{rasterizer}, stream_buffer(size, true) {}
@@ -26,11 +31,12 @@ GLintptr OGLBufferCache::UploadMemory(Tegra::GPUVAddr gpu_addr, std::size_t size
// TODO: Figure out which size is the best for given games.
cache &= size >= 2048;
const auto& host_ptr{Memory::GetPointer(*cpu_addr)};
if (cache) {
auto entry = TryGet(*cpu_addr);
auto entry = TryGet(host_ptr);
if (entry) {
if (entry->size >= size && entry->alignment == alignment) {
return entry->offset;
if (entry->GetSize() >= size && entry->GetAlignment() == alignment) {
return entry->GetOffset();
}
Unregister(entry);
}
@@ -39,17 +45,17 @@ GLintptr OGLBufferCache::UploadMemory(Tegra::GPUVAddr gpu_addr, std::size_t size
AlignBuffer(alignment);
const GLintptr uploaded_offset = buffer_offset;
Memory::ReadBlock(*cpu_addr, buffer_ptr, size);
if (!host_ptr) {
return uploaded_offset;
}
std::memcpy(buffer_ptr, host_ptr, size);
buffer_ptr += size;
buffer_offset += size;
if (cache) {
auto entry = std::make_shared<CachedBufferEntry>();
entry->offset = uploaded_offset;
entry->size = size;
entry->alignment = alignment;
entry->addr = *cpu_addr;
auto entry = std::make_shared<CachedBufferEntry>(*cpu_addr, size, uploaded_offset,
alignment, host_ptr);
Register(entry);
}

View File

@@ -17,22 +17,39 @@ namespace OpenGL {
class RasterizerOpenGL;
struct CachedBufferEntry final : public RasterizerCacheObject {
VAddr GetAddr() const override {
return addr;
class CachedBufferEntry final : public RasterizerCacheObject {
public:
explicit CachedBufferEntry(VAddr cpu_addr, std::size_t size, GLintptr offset,
std::size_t alignment, u8* host_ptr);
VAddr GetCpuAddr() const override {
return cpu_addr;
}
std::size_t GetSizeInBytes() const override {
return size;
}
std::size_t GetSize() const {
return size;
}
GLintptr GetOffset() const {
return offset;
}
std::size_t GetAlignment() const {
return alignment;
}
// We do not have to flush this cache as things in it are never modified by us.
void Flush() override {}
VAddr addr;
std::size_t size;
GLintptr offset;
std::size_t alignment;
private:
VAddr cpu_addr{};
std::size_t size{};
GLintptr offset{};
std::size_t alignment{};
};
class OGLBufferCache final : public RasterizerCache<std::shared_ptr<CachedBufferEntry>> {

View File

@@ -15,12 +15,13 @@
namespace OpenGL {
CachedGlobalRegion::CachedGlobalRegion(VAddr addr, u32 size) : addr{addr}, size{size} {
CachedGlobalRegion::CachedGlobalRegion(VAddr cpu_addr, u32 size, u8* host_ptr)
: cpu_addr{cpu_addr}, size{size}, RasterizerCacheObject{host_ptr} {
buffer.Create();
// Bind and unbind the buffer so it gets allocated by the driver
glBindBuffer(GL_SHADER_STORAGE_BUFFER, buffer.handle);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
LabelGLObject(GL_BUFFER, buffer.handle, addr, "GlobalMemory");
LabelGLObject(GL_BUFFER, buffer.handle, cpu_addr, "GlobalMemory");
}
void CachedGlobalRegion::Reload(u32 size_) {
@@ -35,7 +36,7 @@ void CachedGlobalRegion::Reload(u32 size_) {
// TODO(Rodrigo): Get rid of Memory::GetPointer with a staging buffer
glBindBuffer(GL_SHADER_STORAGE_BUFFER, buffer.handle);
glBufferData(GL_SHADER_STORAGE_BUFFER, size, Memory::GetPointer(addr), GL_DYNAMIC_DRAW);
glBufferData(GL_SHADER_STORAGE_BUFFER, size, GetHostPtr(), GL_DYNAMIC_DRAW);
}
GlobalRegion GlobalRegionCacheOpenGL::TryGetReservedGlobalRegion(VAddr addr, u32 size) const {
@@ -46,19 +47,19 @@ GlobalRegion GlobalRegionCacheOpenGL::TryGetReservedGlobalRegion(VAddr addr, u32
return search->second;
}
GlobalRegion GlobalRegionCacheOpenGL::GetUncachedGlobalRegion(VAddr addr, u32 size) {
GlobalRegion GlobalRegionCacheOpenGL::GetUncachedGlobalRegion(VAddr addr, u32 size, u8* host_ptr) {
GlobalRegion region{TryGetReservedGlobalRegion(addr, size)};
if (!region) {
// No reserved surface available, create a new one and reserve it
region = std::make_shared<CachedGlobalRegion>(addr, size);
region = std::make_shared<CachedGlobalRegion>(addr, size, host_ptr);
ReserveGlobalRegion(region);
}
region->Reload(size);
return region;
}
void GlobalRegionCacheOpenGL::ReserveGlobalRegion(const GlobalRegion& region) {
reserve[region->GetAddr()] = region;
void GlobalRegionCacheOpenGL::ReserveGlobalRegion(GlobalRegion region) {
reserve.insert_or_assign(region->GetCpuAddr(), std::move(region));
}
GlobalRegionCacheOpenGL::GlobalRegionCacheOpenGL(RasterizerOpenGL& rasterizer)
@@ -80,11 +81,12 @@ GlobalRegion GlobalRegionCacheOpenGL::GetGlobalRegion(
ASSERT(actual_addr);
// Look up global region in the cache based on address
GlobalRegion region = TryGet(*actual_addr);
const auto& host_ptr{Memory::GetPointer(*actual_addr)};
GlobalRegion region{TryGet(host_ptr)};
if (!region) {
// No global region found - create a new one
region = GetUncachedGlobalRegion(*actual_addr, size);
region = GetUncachedGlobalRegion(*actual_addr, size, host_ptr);
Register(region);
}

View File

@@ -27,15 +27,13 @@ using GlobalRegion = std::shared_ptr<CachedGlobalRegion>;
class CachedGlobalRegion final : public RasterizerCacheObject {
public:
explicit CachedGlobalRegion(VAddr addr, u32 size);
explicit CachedGlobalRegion(VAddr cpu_addr, u32 size, u8* host_ptr);
/// Gets the address of the shader in guest memory, required for cache management
VAddr GetAddr() const {
return addr;
VAddr GetCpuAddr() const override {
return cpu_addr;
}
/// Gets the size of the shader in guest memory, required for cache management
std::size_t GetSizeInBytes() const {
std::size_t GetSizeInBytes() const override {
return size;
}
@@ -53,9 +51,8 @@ public:
}
private:
VAddr addr{};
VAddr cpu_addr{};
u32 size{};
OGLBuffer buffer;
};
@@ -69,8 +66,8 @@ public:
private:
GlobalRegion TryGetReservedGlobalRegion(VAddr addr, u32 size) const;
GlobalRegion GetUncachedGlobalRegion(VAddr addr, u32 size);
void ReserveGlobalRegion(const GlobalRegion& region);
GlobalRegion GetUncachedGlobalRegion(VAddr addr, u32 size, u8* host_ptr);
void ReserveGlobalRegion(GlobalRegion region);
std::unordered_map<VAddr, GlobalRegion> reserve;
};

View File

@@ -102,8 +102,9 @@ struct FramebufferCacheKey {
RasterizerOpenGL::RasterizerOpenGL(Core::Frontend::EmuWindow& window, Core::System& system,
ScreenInfo& info)
: res_cache{*this}, shader_cache{*this, system}, global_cache{*this}, emu_window{window},
screen_info{info}, buffer_cache(*this, STREAM_BUFFER_SIZE) {
: res_cache{*this}, shader_cache{*this, system}, global_cache{*this},
emu_window{window}, system{system}, screen_info{info},
buffer_cache(*this, STREAM_BUFFER_SIZE) {
// Create sampler objects
for (std::size_t i = 0; i < texture_samplers.size(); ++i) {
texture_samplers[i].Create();
@@ -118,7 +119,7 @@ RasterizerOpenGL::RasterizerOpenGL(Core::Frontend::EmuWindow& window, Core::Syst
glGetIntegerv(GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT, &uniform_buffer_alignment);
LOG_CRITICAL(Render_OpenGL, "Sync fixed function OpenGL state here!");
LOG_DEBUG(Render_OpenGL, "Sync fixed function OpenGL state here");
CheckExtensions();
}
@@ -138,7 +139,7 @@ void RasterizerOpenGL::CheckExtensions() {
}
GLuint RasterizerOpenGL::SetupVertexFormat() {
auto& gpu = Core::System::GetInstance().GPU().Maxwell3D();
auto& gpu = system.GPU().Maxwell3D();
const auto& regs = gpu.regs;
if (!gpu.dirty_flags.vertex_attrib_format) {
@@ -177,7 +178,7 @@ GLuint RasterizerOpenGL::SetupVertexFormat() {
continue;
const auto& buffer = regs.vertex_array[attrib.buffer];
LOG_TRACE(HW_GPU,
LOG_TRACE(Render_OpenGL,
"vertex attrib {}, count={}, size={}, type={}, offset={}, normalize={}",
index, attrib.ComponentCount(), attrib.SizeString(), attrib.TypeString(),
attrib.offset.Value(), attrib.IsNormalized());
@@ -207,7 +208,7 @@ GLuint RasterizerOpenGL::SetupVertexFormat() {
}
void RasterizerOpenGL::SetupVertexBuffer(GLuint vao) {
auto& gpu = Core::System::GetInstance().GPU().Maxwell3D();
auto& gpu = system.GPU().Maxwell3D();
const auto& regs = gpu.regs;
if (gpu.dirty_flags.vertex_array.none())
@@ -248,7 +249,7 @@ void RasterizerOpenGL::SetupVertexBuffer(GLuint vao) {
}
DrawParameters RasterizerOpenGL::SetupDraw() {
const auto& gpu = Core::System::GetInstance().GPU().Maxwell3D();
const auto& gpu = system.GPU().Maxwell3D();
const auto& regs = gpu.regs;
const bool is_indexed = accelerate_draw == AccelDraw::Indexed;
@@ -297,7 +298,7 @@ DrawParameters RasterizerOpenGL::SetupDraw() {
void RasterizerOpenGL::SetupShaders(GLenum primitive_mode) {
MICROPROFILE_SCOPE(OpenGL_Shader);
auto& gpu = Core::System::GetInstance().GPU().Maxwell3D();
auto& gpu = system.GPU().Maxwell3D();
BaseBindings base_bindings;
std::array<bool, Maxwell::NumClipDistances> clip_distances{};
@@ -343,9 +344,8 @@ void RasterizerOpenGL::SetupShaders(GLenum primitive_mode) {
shader_program_manager->UseProgrammableFragmentShader(program_handle);
break;
default:
LOG_CRITICAL(HW_GPU, "Unimplemented shader index={}, enable={}, offset=0x{:08X}", index,
shader_config.enable.Value(), shader_config.offset);
UNREACHABLE();
UNIMPLEMENTED_MSG("Unimplemented shader index={}, enable={}, offset=0x{:08X}", index,
shader_config.enable.Value(), shader_config.offset);
}
const auto stage_enum = static_cast<Maxwell::ShaderStage>(stage);
@@ -414,7 +414,7 @@ void RasterizerOpenGL::SetupCachedFramebuffer(const FramebufferCacheKey& fbkey,
}
std::size_t RasterizerOpenGL::CalculateVertexArraysSize() const {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
std::size_t size = 0;
for (u32 index = 0; index < Maxwell::NumVertexArrays; ++index) {
@@ -432,7 +432,7 @@ std::size_t RasterizerOpenGL::CalculateVertexArraysSize() const {
}
std::size_t RasterizerOpenGL::CalculateIndexBufferSize() const {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
return static_cast<std::size_t>(regs.index_array.count) *
static_cast<std::size_t>(regs.index_array.FormatSizeInBytes());
@@ -449,7 +449,7 @@ static constexpr auto RangeFromInterval(Map& map, const Interval& interval) {
return boost::make_iterator_range(map.equal_range(interval));
}
void RasterizerOpenGL::UpdatePagesCachedCount(Tegra::GPUVAddr addr, u64 size, int delta) {
void RasterizerOpenGL::UpdatePagesCachedCount(VAddr addr, u64 size, int delta) {
const u64 page_start{addr >> Memory::PAGE_BITS};
const u64 page_end{(addr + size + Memory::PAGE_SIZE - 1) >> Memory::PAGE_BITS};
@@ -488,7 +488,7 @@ std::pair<bool, bool> RasterizerOpenGL::ConfigureFramebuffers(
OpenGLState& current_state, bool using_color_fb, bool using_depth_fb, bool preserve_contents,
std::optional<std::size_t> single_color_target) {
MICROPROFILE_SCOPE(OpenGL_Framebuffer);
auto& gpu = Core::System::GetInstance().GPU().Maxwell3D();
auto& gpu = system.GPU().Maxwell3D();
const auto& regs = gpu.regs;
const FramebufferConfigState fb_config_state{using_color_fb, using_depth_fb, preserve_contents,
@@ -582,7 +582,7 @@ void RasterizerOpenGL::Clear() {
const auto prev_state{state};
SCOPE_EXIT({ prev_state.Apply(); });
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
bool use_color{};
bool use_depth{};
bool use_stencil{};
@@ -673,7 +673,7 @@ void RasterizerOpenGL::DrawArrays() {
return;
MICROPROFILE_SCOPE(OpenGL_Drawing);
auto& gpu = Core::System::GetInstance().GPU().Maxwell3D();
auto& gpu = system.GPU().Maxwell3D();
const auto& regs = gpu.regs;
ConfigureFramebuffers(state);
@@ -747,12 +747,12 @@ void RasterizerOpenGL::DrawArrays() {
void RasterizerOpenGL::FlushAll() {}
void RasterizerOpenGL::FlushRegion(VAddr addr, u64 size) {
void RasterizerOpenGL::FlushRegion(CacheAddr addr, u64 size) {
MICROPROFILE_SCOPE(OpenGL_CacheManagement);
res_cache.FlushRegion(addr, size);
}
void RasterizerOpenGL::InvalidateRegion(VAddr addr, u64 size) {
void RasterizerOpenGL::InvalidateRegion(CacheAddr addr, u64 size) {
MICROPROFILE_SCOPE(OpenGL_CacheManagement);
res_cache.InvalidateRegion(addr, size);
shader_cache.InvalidateRegion(addr, size);
@@ -760,7 +760,7 @@ void RasterizerOpenGL::InvalidateRegion(VAddr addr, u64 size) {
buffer_cache.InvalidateRegion(addr, size);
}
void RasterizerOpenGL::FlushAndInvalidateRegion(VAddr addr, u64 size) {
void RasterizerOpenGL::FlushAndInvalidateRegion(CacheAddr addr, u64 size) {
FlushRegion(addr, size);
InvalidateRegion(addr, size);
}
@@ -782,7 +782,7 @@ bool RasterizerOpenGL::AccelerateDisplay(const Tegra::FramebufferConfig& config,
MICROPROFILE_SCOPE(OpenGL_CacheManagement);
const auto& surface{res_cache.TryFindFramebufferSurface(framebuffer_addr)};
const auto& surface{res_cache.TryFindFramebufferSurface(Memory::GetPointer(framebuffer_addr))};
if (!surface) {
return {};
}
@@ -793,7 +793,10 @@ bool RasterizerOpenGL::AccelerateDisplay(const Tegra::FramebufferConfig& config,
VideoCore::Surface::PixelFormatFromGPUPixelFormat(config.pixel_format)};
ASSERT_MSG(params.width == config.width, "Framebuffer width is different");
ASSERT_MSG(params.height == config.height, "Framebuffer height is different");
ASSERT_MSG(params.pixel_format == pixel_format, "Framebuffer pixel_format is different");
if (params.pixel_format != pixel_format) {
LOG_WARNING(Render_OpenGL, "Framebuffer pixel_format is different");
}
screen_info.display_texture = surface->Texture().handle;
@@ -802,104 +805,87 @@ bool RasterizerOpenGL::AccelerateDisplay(const Tegra::FramebufferConfig& config,
void RasterizerOpenGL::SamplerInfo::Create() {
sampler.Create();
mag_filter = min_filter = Tegra::Texture::TextureFilter::Linear;
wrap_u = wrap_v = wrap_p = Tegra::Texture::WrapMode::Wrap;
uses_depth_compare = false;
mag_filter = Tegra::Texture::TextureFilter::Linear;
min_filter = Tegra::Texture::TextureFilter::Linear;
wrap_u = Tegra::Texture::WrapMode::Wrap;
wrap_v = Tegra::Texture::WrapMode::Wrap;
wrap_p = Tegra::Texture::WrapMode::Wrap;
use_depth_compare = false;
depth_compare_func = Tegra::Texture::DepthCompareFunc::Never;
// default is GL_LINEAR_MIPMAP_LINEAR
// OpenGL's default is GL_LINEAR_MIPMAP_LINEAR
glSamplerParameteri(sampler.handle, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
// Other attributes have correct defaults
glSamplerParameteri(sampler.handle, GL_TEXTURE_COMPARE_FUNC, GL_NEVER);
// Other attributes have correct defaults
}
void RasterizerOpenGL::SamplerInfo::SyncWithConfig(const Tegra::Texture::TSCEntry& config) {
const GLuint s = sampler.handle;
const GLuint sampler_id = sampler.handle;
if (mag_filter != config.mag_filter) {
mag_filter = config.mag_filter;
glSamplerParameteri(
s, GL_TEXTURE_MAG_FILTER,
sampler_id, GL_TEXTURE_MAG_FILTER,
MaxwellToGL::TextureFilterMode(mag_filter, Tegra::Texture::TextureMipmapFilter::None));
}
if (min_filter != config.min_filter || mip_filter != config.mip_filter) {
if (min_filter != config.min_filter || mipmap_filter != config.mipmap_filter) {
min_filter = config.min_filter;
mip_filter = config.mip_filter;
glSamplerParameteri(s, GL_TEXTURE_MIN_FILTER,
MaxwellToGL::TextureFilterMode(min_filter, mip_filter));
mipmap_filter = config.mipmap_filter;
glSamplerParameteri(sampler_id, GL_TEXTURE_MIN_FILTER,
MaxwellToGL::TextureFilterMode(min_filter, mipmap_filter));
}
if (wrap_u != config.wrap_u) {
wrap_u = config.wrap_u;
glSamplerParameteri(s, GL_TEXTURE_WRAP_S, MaxwellToGL::WrapMode(wrap_u));
glSamplerParameteri(sampler_id, GL_TEXTURE_WRAP_S, MaxwellToGL::WrapMode(wrap_u));
}
if (wrap_v != config.wrap_v) {
wrap_v = config.wrap_v;
glSamplerParameteri(s, GL_TEXTURE_WRAP_T, MaxwellToGL::WrapMode(wrap_v));
glSamplerParameteri(sampler_id, GL_TEXTURE_WRAP_T, MaxwellToGL::WrapMode(wrap_v));
}
if (wrap_p != config.wrap_p) {
wrap_p = config.wrap_p;
glSamplerParameteri(s, GL_TEXTURE_WRAP_R, MaxwellToGL::WrapMode(wrap_p));
glSamplerParameteri(sampler_id, GL_TEXTURE_WRAP_R, MaxwellToGL::WrapMode(wrap_p));
}
if (uses_depth_compare != (config.depth_compare_enabled == 1)) {
uses_depth_compare = (config.depth_compare_enabled == 1);
if (uses_depth_compare) {
glSamplerParameteri(s, GL_TEXTURE_COMPARE_MODE, GL_COMPARE_REF_TO_TEXTURE);
} else {
glSamplerParameteri(s, GL_TEXTURE_COMPARE_MODE, GL_NONE);
}
if (const bool enabled = config.depth_compare_enabled == 1; use_depth_compare != enabled) {
use_depth_compare = enabled;
glSamplerParameteri(sampler_id, GL_TEXTURE_COMPARE_MODE,
use_depth_compare ? GL_COMPARE_REF_TO_TEXTURE : GL_NONE);
}
if (depth_compare_func != config.depth_compare_func) {
depth_compare_func = config.depth_compare_func;
glSamplerParameteri(s, GL_TEXTURE_COMPARE_FUNC,
glSamplerParameteri(sampler_id, GL_TEXTURE_COMPARE_FUNC,
MaxwellToGL::DepthCompareFunc(depth_compare_func));
}
GLvec4 new_border_color;
if (config.srgb_conversion) {
new_border_color[0] = config.srgb_border_color_r / 255.0f;
new_border_color[1] = config.srgb_border_color_g / 255.0f;
new_border_color[2] = config.srgb_border_color_g / 255.0f;
} else {
new_border_color[0] = config.border_color_r;
new_border_color[1] = config.border_color_g;
new_border_color[2] = config.border_color_b;
}
new_border_color[3] = config.border_color_a;
if (border_color != new_border_color) {
if (const auto new_border_color = config.GetBorderColor(); border_color != new_border_color) {
border_color = new_border_color;
glSamplerParameterfv(s, GL_TEXTURE_BORDER_COLOR, border_color.data());
glSamplerParameterfv(sampler_id, GL_TEXTURE_BORDER_COLOR, border_color.data());
}
const float anisotropic_max = static_cast<float>(1 << config.max_anisotropy.Value());
if (anisotropic_max != max_anisotropic) {
max_anisotropic = anisotropic_max;
if (const float anisotropic = config.GetMaxAnisotropy(); max_anisotropic != anisotropic) {
max_anisotropic = anisotropic;
if (GLAD_GL_ARB_texture_filter_anisotropic) {
glSamplerParameterf(s, GL_TEXTURE_MAX_ANISOTROPY, max_anisotropic);
glSamplerParameterf(sampler_id, GL_TEXTURE_MAX_ANISOTROPY, max_anisotropic);
} else if (GLAD_GL_EXT_texture_filter_anisotropic) {
glSamplerParameterf(s, GL_TEXTURE_MAX_ANISOTROPY_EXT, max_anisotropic);
glSamplerParameterf(sampler_id, GL_TEXTURE_MAX_ANISOTROPY_EXT, max_anisotropic);
}
}
const float lod_min = static_cast<float>(config.min_lod_clamp.Value()) / 256.0f;
if (lod_min != min_lod) {
min_lod = lod_min;
glSamplerParameterf(s, GL_TEXTURE_MIN_LOD, min_lod);
if (const float min = config.GetMinLod(); min_lod != min) {
min_lod = min;
glSamplerParameterf(sampler_id, GL_TEXTURE_MIN_LOD, min_lod);
}
if (const float max = config.GetMaxLod(); max_lod != max) {
max_lod = max;
glSamplerParameterf(sampler_id, GL_TEXTURE_MAX_LOD, max_lod);
}
const float lod_max = static_cast<float>(config.max_lod_clamp.Value()) / 256.0f;
if (lod_max != max_lod) {
max_lod = lod_max;
glSamplerParameterf(s, GL_TEXTURE_MAX_LOD, max_lod);
}
const u32 bias = config.mip_lod_bias.Value();
// Sign extend the 13-bit value.
constexpr u32 mask = 1U << (13 - 1);
const float bias_lod = static_cast<s32>((bias ^ mask) - mask) / 256.f;
if (lod_bias != bias_lod) {
lod_bias = bias_lod;
glSamplerParameterf(s, GL_TEXTURE_LOD_BIAS, lod_bias);
if (const float bias = config.GetLodBias(); lod_bias != bias) {
lod_bias = bias;
glSamplerParameterf(sampler_id, GL_TEXTURE_LOD_BIAS, lod_bias);
}
}
@@ -907,7 +893,7 @@ void RasterizerOpenGL::SetupConstBuffers(Tegra::Engines::Maxwell3D::Regs::Shader
const Shader& shader, GLuint program_handle,
BaseBindings base_bindings) {
MICROPROFILE_SCOPE(OpenGL_UBO);
const auto& gpu = Core::System::GetInstance().GPU();
const auto& gpu = system.GPU();
const auto& maxwell3d = gpu.Maxwell3D();
const auto& shader_stage = maxwell3d.state.shader_stages[static_cast<std::size_t>(stage)];
const auto& entries = shader->GetShaderEntries().const_buffers;
@@ -939,8 +925,8 @@ void RasterizerOpenGL::SetupConstBuffers(Tegra::Engines::Maxwell3D::Regs::Shader
size = buffer.size;
if (size > MaxConstbufferSize) {
LOG_CRITICAL(HW_GPU, "indirect constbuffer size {} exceeds maximum {}", size,
MaxConstbufferSize);
LOG_WARNING(Render_OpenGL, "Indirect constbuffer size {} exceeds maximum {}", size,
MaxConstbufferSize);
size = MaxConstbufferSize;
}
} else {
@@ -986,7 +972,7 @@ void RasterizerOpenGL::SetupGlobalRegions(Tegra::Engines::Maxwell3D::Regs::Shade
void RasterizerOpenGL::SetupTextures(Maxwell::ShaderStage stage, const Shader& shader,
GLuint program_handle, BaseBindings base_bindings) {
MICROPROFILE_SCOPE(OpenGL_Texture);
const auto& gpu = Core::System::GetInstance().GPU();
const auto& gpu = system.GPU();
const auto& maxwell3d = gpu.Maxwell3D();
const auto& entries = shader->GetShaderEntries().samplers;
@@ -1000,10 +986,9 @@ void RasterizerOpenGL::SetupTextures(Maxwell::ShaderStage stage, const Shader& s
texture_samplers[current_bindpoint].SyncWithConfig(texture.tsc);
Surface surface = res_cache.GetTextureSurface(texture, entry);
if (surface != nullptr) {
if (Surface surface = res_cache.GetTextureSurface(texture, entry); surface) {
state.texture_units[current_bindpoint].texture =
entry.IsArray() ? surface->TextureLayer().handle : surface->Texture().handle;
surface->Texture(entry.IsArray()).handle;
surface->UpdateSwizzle(texture.tic.x_source, texture.tic.y_source, texture.tic.z_source,
texture.tic.w_source);
} else {
@@ -1014,7 +999,7 @@ void RasterizerOpenGL::SetupTextures(Maxwell::ShaderStage stage, const Shader& s
}
void RasterizerOpenGL::SyncViewport(OpenGLState& current_state) {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
const bool geometry_shaders_enabled =
regs.IsShaderConfigEnabled(static_cast<size_t>(Maxwell::ShaderProgram::Geometry));
const std::size_t viewport_count =
@@ -1037,7 +1022,7 @@ void RasterizerOpenGL::SyncViewport(OpenGLState& current_state) {
void RasterizerOpenGL::SyncClipEnabled(
const std::array<bool, Maxwell::Regs::NumClipDistances>& clip_mask) {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
const std::array<bool, Maxwell::Regs::NumClipDistances> reg_state{
regs.clip_distance_enabled.c0 != 0, regs.clip_distance_enabled.c1 != 0,
regs.clip_distance_enabled.c2 != 0, regs.clip_distance_enabled.c3 != 0,
@@ -1054,7 +1039,7 @@ void RasterizerOpenGL::SyncClipCoef() {
}
void RasterizerOpenGL::SyncCullMode() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
state.cull.enabled = regs.cull.enabled != 0;
@@ -1078,14 +1063,14 @@ void RasterizerOpenGL::SyncCullMode() {
}
void RasterizerOpenGL::SyncPrimitiveRestart() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
state.primitive_restart.enabled = regs.primitive_restart.enabled;
state.primitive_restart.index = regs.primitive_restart.index;
}
void RasterizerOpenGL::SyncDepthTestState() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
state.depth.test_enabled = regs.depth_test_enable != 0;
state.depth.write_mask = regs.depth_write_enabled ? GL_TRUE : GL_FALSE;
@@ -1097,7 +1082,7 @@ void RasterizerOpenGL::SyncDepthTestState() {
}
void RasterizerOpenGL::SyncStencilTestState() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
state.stencil.test_enabled = regs.stencil_enable != 0;
if (!regs.stencil_enable) {
@@ -1131,7 +1116,7 @@ void RasterizerOpenGL::SyncStencilTestState() {
}
void RasterizerOpenGL::SyncColorMask() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
const std::size_t count =
regs.independent_blend_enable ? Tegra::Engines::Maxwell3D::Regs::NumRenderTargets : 1;
for (std::size_t i = 0; i < count; i++) {
@@ -1145,18 +1130,18 @@ void RasterizerOpenGL::SyncColorMask() {
}
void RasterizerOpenGL::SyncMultiSampleState() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
state.multisample_control.alpha_to_coverage = regs.multisample_control.alpha_to_coverage != 0;
state.multisample_control.alpha_to_one = regs.multisample_control.alpha_to_one != 0;
}
void RasterizerOpenGL::SyncFragmentColorClampState() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
state.fragment_color_clamp.enabled = regs.frag_color_clamp != 0;
}
void RasterizerOpenGL::SyncBlendState() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
state.blend_color.red = regs.blend_color.r;
state.blend_color.green = regs.blend_color.g;
@@ -1198,7 +1183,7 @@ void RasterizerOpenGL::SyncBlendState() {
}
void RasterizerOpenGL::SyncLogicOpState() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
state.logic_op.enabled = regs.logic_op.enable != 0;
@@ -1212,7 +1197,7 @@ void RasterizerOpenGL::SyncLogicOpState() {
}
void RasterizerOpenGL::SyncScissorTest(OpenGLState& current_state) {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
const bool geometry_shaders_enabled =
regs.IsShaderConfigEnabled(static_cast<size_t>(Maxwell::ShaderProgram::Geometry));
const std::size_t viewport_count =
@@ -1234,21 +1219,17 @@ void RasterizerOpenGL::SyncScissorTest(OpenGLState& current_state) {
}
void RasterizerOpenGL::SyncTransformFeedback() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
if (regs.tfb_enabled != 0) {
LOG_CRITICAL(Render_OpenGL, "Transform feedbacks are not implemented");
UNREACHABLE();
}
const auto& regs = system.GPU().Maxwell3D().regs;
UNIMPLEMENTED_IF_MSG(regs.tfb_enabled != 0, "Transform feedbacks are not implemented");
}
void RasterizerOpenGL::SyncPointState() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
state.point.size = regs.point_size;
}
void RasterizerOpenGL::SyncPolygonOffset() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
const auto& regs = system.GPU().Maxwell3D().regs;
state.polygon_offset.fill_enable = regs.polygon_offset_fill_enable != 0;
state.polygon_offset.line_enable = regs.polygon_offset_line_enable != 0;
state.polygon_offset.point_enable = regs.polygon_offset_point_enable != 0;
@@ -1258,13 +1239,9 @@ void RasterizerOpenGL::SyncPolygonOffset() {
}
void RasterizerOpenGL::CheckAlphaTests() {
const auto& regs = Core::System::GetInstance().GPU().Maxwell3D().regs;
if (regs.alpha_test_enabled != 0 && regs.rt_control.count > 1) {
LOG_CRITICAL(Render_OpenGL, "Alpha Testing is enabled with Multiple Render Targets, "
"this behavior is undefined.");
UNREACHABLE();
}
const auto& regs = system.GPU().Maxwell3D().regs;
UNIMPLEMENTED_IF_MSG(regs.alpha_test_enabled != 0 && regs.rt_control.count > 1,
"Alpha Testing is enabled with more than one rendertarget");
}
} // namespace OpenGL

View File

@@ -57,9 +57,9 @@ public:
void DrawArrays() override;
void Clear() override;
void FlushAll() override;
void FlushRegion(VAddr addr, u64 size) override;
void InvalidateRegion(VAddr addr, u64 size) override;
void FlushAndInvalidateRegion(VAddr addr, u64 size) override;
void FlushRegion(CacheAddr addr, u64 size) override;
void InvalidateRegion(CacheAddr addr, u64 size) override;
void FlushAndInvalidateRegion(CacheAddr addr, u64 size) override;
bool AccelerateSurfaceCopy(const Tegra::Engines::Fermi2D::Regs::Surface& src,
const Tegra::Engines::Fermi2D::Regs::Surface& dst,
const Common::Rectangle<u32>& src_rect,
@@ -67,7 +67,7 @@ public:
bool AccelerateDisplay(const Tegra::FramebufferConfig& config, VAddr framebuffer_addr,
u32 pixel_stride) override;
bool AccelerateDrawBatch(bool is_indexed) override;
void UpdatePagesCachedCount(Tegra::GPUVAddr addr, u64 size, int delta) override;
void UpdatePagesCachedCount(VAddr addr, u64 size, int delta) override;
void LoadDiskResources(const std::atomic_bool& stop_loading,
const VideoCore::DiskResourceLoadCallback& callback) override;
@@ -94,11 +94,12 @@ private:
private:
Tegra::Texture::TextureFilter mag_filter = Tegra::Texture::TextureFilter::Nearest;
Tegra::Texture::TextureFilter min_filter = Tegra::Texture::TextureFilter::Nearest;
Tegra::Texture::TextureMipmapFilter mip_filter = Tegra::Texture::TextureMipmapFilter::None;
Tegra::Texture::TextureMipmapFilter mipmap_filter =
Tegra::Texture::TextureMipmapFilter::None;
Tegra::Texture::WrapMode wrap_u = Tegra::Texture::WrapMode::ClampToEdge;
Tegra::Texture::WrapMode wrap_v = Tegra::Texture::WrapMode::ClampToEdge;
Tegra::Texture::WrapMode wrap_p = Tegra::Texture::WrapMode::ClampToEdge;
bool uses_depth_compare = false;
bool use_depth_compare = false;
Tegra::Texture::DepthCompareFunc depth_compare_func =
Tegra::Texture::DepthCompareFunc::Always;
GLvec4 border_color = {};
@@ -214,6 +215,7 @@ private:
GlobalRegionCacheOpenGL global_cache;
Core::Frontend::EmuWindow& emu_window;
Core::System& system;
ScreenInfo& screen_info;

View File

@@ -21,7 +21,7 @@
#include "video_core/renderer_opengl/gl_rasterizer_cache.h"
#include "video_core/renderer_opengl/utils.h"
#include "video_core/surface.h"
#include "video_core/textures/astc.h"
#include "video_core/textures/convert.h"
#include "video_core/textures/decoders.h"
namespace OpenGL {
@@ -61,6 +61,7 @@ void SurfaceParams::InitCacheParameters(Tegra::GPUVAddr gpu_addr_) {
addr = cpu_addr ? *cpu_addr : 0;
gpu_addr = gpu_addr_;
host_ptr = Memory::GetPointer(addr);
size_in_bytes = SizeInBytesRaw();
if (IsPixelFormatASTC(pixel_format)) {
@@ -400,6 +401,27 @@ static const FormatTuple& GetFormatTuple(PixelFormat pixel_format, ComponentType
return format;
}
/// Returns the discrepant array target
constexpr GLenum GetArrayDiscrepantTarget(SurfaceTarget target) {
switch (target) {
case SurfaceTarget::Texture1D:
return GL_TEXTURE_1D_ARRAY;
case SurfaceTarget::Texture2D:
return GL_TEXTURE_2D_ARRAY;
case SurfaceTarget::Texture3D:
return GL_NONE;
case SurfaceTarget::Texture1DArray:
return GL_TEXTURE_1D;
case SurfaceTarget::Texture2DArray:
return GL_TEXTURE_2D;
case SurfaceTarget::TextureCubemap:
return GL_TEXTURE_CUBE_MAP_ARRAY;
case SurfaceTarget::TextureCubeArray:
return GL_TEXTURE_CUBE_MAP;
}
return GL_NONE;
}
Common::Rectangle<u32> SurfaceParams::GetRect(u32 mip_level) const {
u32 actual_height{std::max(1U, unaligned_height >> mip_level)};
if (IsPixelFormatASTC(pixel_format)) {
@@ -425,7 +447,7 @@ void SwizzleFunc(const MortonSwizzleMode& mode, const SurfaceParams& params,
MortonSwizzle(mode, params.pixel_format, params.MipWidth(mip_level),
params.MipBlockHeight(mip_level), params.MipHeight(mip_level),
params.MipBlockDepth(mip_level), 1, params.tile_width_spacing,
gl_buffer.data() + offset_gl, gl_size, params.addr + offset);
gl_buffer.data() + offset_gl, params.addr + offset);
offset += layer_size;
offset_gl += gl_size;
}
@@ -434,7 +456,7 @@ void SwizzleFunc(const MortonSwizzleMode& mode, const SurfaceParams& params,
MortonSwizzle(mode, params.pixel_format, params.MipWidth(mip_level),
params.MipBlockHeight(mip_level), params.MipHeight(mip_level),
params.MipBlockDepth(mip_level), depth, params.tile_width_spacing,
gl_buffer.data(), gl_buffer.size(), params.addr + offset);
gl_buffer.data(), params.addr + offset);
}
}
@@ -542,8 +564,8 @@ void RasterizerCacheOpenGL::CopySurface(const Surface& src_surface, const Surfac
}
CachedSurface::CachedSurface(const SurfaceParams& params)
: params(params), gl_target(SurfaceTargetToGL(params.target)),
cached_size_in_bytes(params.size_in_bytes) {
: params{params}, gl_target{SurfaceTargetToGL(params.target)},
cached_size_in_bytes{params.size_in_bytes}, RasterizerCacheObject{params.host_ptr} {
texture.Create(gl_target);
// TODO(Rodrigo): Using params.GetRect() returns a different size than using its Mip*(0)
@@ -597,103 +619,6 @@ CachedSurface::CachedSurface(const SurfaceParams& params)
}
}
static void ConvertS8Z24ToZ24S8(std::vector<u8>& data, u32 width, u32 height, bool reverse) {
union S8Z24 {
BitField<0, 24, u32> z24;
BitField<24, 8, u32> s8;
};
static_assert(sizeof(S8Z24) == 4, "S8Z24 is incorrect size");
union Z24S8 {
BitField<0, 8, u32> s8;
BitField<8, 24, u32> z24;
};
static_assert(sizeof(Z24S8) == 4, "Z24S8 is incorrect size");
S8Z24 s8z24_pixel{};
Z24S8 z24s8_pixel{};
constexpr auto bpp{GetBytesPerPixel(PixelFormat::S8Z24)};
for (std::size_t y = 0; y < height; ++y) {
for (std::size_t x = 0; x < width; ++x) {
const std::size_t offset{bpp * (y * width + x)};
if (reverse) {
std::memcpy(&z24s8_pixel, &data[offset], sizeof(Z24S8));
s8z24_pixel.s8.Assign(z24s8_pixel.s8);
s8z24_pixel.z24.Assign(z24s8_pixel.z24);
std::memcpy(&data[offset], &s8z24_pixel, sizeof(S8Z24));
} else {
std::memcpy(&s8z24_pixel, &data[offset], sizeof(S8Z24));
z24s8_pixel.s8.Assign(s8z24_pixel.s8);
z24s8_pixel.z24.Assign(s8z24_pixel.z24);
std::memcpy(&data[offset], &z24s8_pixel, sizeof(Z24S8));
}
}
}
}
/**
* Helper function to perform software conversion (as needed) when loading a buffer from Switch
* memory. This is for Maxwell pixel formats that cannot be represented as-is in OpenGL or with
* typical desktop GPUs.
*/
static void ConvertFormatAsNeeded_LoadGLBuffer(std::vector<u8>& data, PixelFormat pixel_format,
u32 width, u32 height, u32 depth) {
switch (pixel_format) {
case PixelFormat::ASTC_2D_4X4:
case PixelFormat::ASTC_2D_8X8:
case PixelFormat::ASTC_2D_8X5:
case PixelFormat::ASTC_2D_5X4:
case PixelFormat::ASTC_2D_5X5:
case PixelFormat::ASTC_2D_4X4_SRGB:
case PixelFormat::ASTC_2D_8X8_SRGB:
case PixelFormat::ASTC_2D_8X5_SRGB:
case PixelFormat::ASTC_2D_5X4_SRGB:
case PixelFormat::ASTC_2D_5X5_SRGB:
case PixelFormat::ASTC_2D_10X8:
case PixelFormat::ASTC_2D_10X8_SRGB: {
// Convert ASTC pixel formats to RGBA8, as most desktop GPUs do not support ASTC.
u32 block_width{};
u32 block_height{};
std::tie(block_width, block_height) = GetASTCBlockSize(pixel_format);
data =
Tegra::Texture::ASTC::Decompress(data, width, height, depth, block_width, block_height);
break;
}
case PixelFormat::S8Z24:
// Convert the S8Z24 depth format to Z24S8, as OpenGL does not support S8Z24.
ConvertS8Z24ToZ24S8(data, width, height, false);
break;
}
}
/**
* Helper function to perform software conversion (as needed) when flushing a buffer from OpenGL to
* Switch memory. This is for Maxwell pixel formats that cannot be represented as-is in OpenGL or
* with typical desktop GPUs.
*/
static void ConvertFormatAsNeeded_FlushGLBuffer(std::vector<u8>& data, PixelFormat pixel_format,
u32 width, u32 height) {
switch (pixel_format) {
case PixelFormat::ASTC_2D_4X4:
case PixelFormat::ASTC_2D_8X8:
case PixelFormat::ASTC_2D_4X4_SRGB:
case PixelFormat::ASTC_2D_8X8_SRGB:
case PixelFormat::ASTC_2D_5X5:
case PixelFormat::ASTC_2D_5X5_SRGB:
case PixelFormat::ASTC_2D_10X8:
case PixelFormat::ASTC_2D_10X8_SRGB: {
LOG_CRITICAL(HW_GPU, "Conversion of format {} after texture flushing is not implemented",
static_cast<u32>(pixel_format));
UNREACHABLE();
break;
}
case PixelFormat::S8Z24:
// Convert the Z24S8 depth format to S8Z24, as OpenGL does not support S8Z24.
ConvertS8Z24ToZ24S8(data, width, height, true);
break;
}
}
MICROPROFILE_DEFINE(OpenGL_SurfaceLoad, "OpenGL", "Surface Load", MP_RGB(128, 192, 64));
void CachedSurface::LoadGLBuffer() {
MICROPROFILE_SCOPE(OpenGL_SurfaceLoad);
@@ -709,10 +634,9 @@ void CachedSurface::LoadGLBuffer() {
const u32 bpp = params.GetFormatBpp() / 8;
const u32 copy_size = params.width * bpp;
if (params.pitch == copy_size) {
std::memcpy(gl_buffer[0].data(), Memory::GetPointer(params.addr),
params.size_in_bytes_gl);
std::memcpy(gl_buffer[0].data(), params.host_ptr, params.size_in_bytes_gl);
} else {
const u8* start = Memory::GetPointer(params.addr);
const u8* start{params.host_ptr};
u8* write_to = gl_buffer[0].data();
for (u32 h = params.height; h > 0; h--) {
std::memcpy(write_to, start, copy_size);
@@ -722,8 +646,16 @@ void CachedSurface::LoadGLBuffer() {
}
}
for (u32 i = 0; i < params.max_mip_level; i++) {
ConvertFormatAsNeeded_LoadGLBuffer(gl_buffer[i], params.pixel_format, params.MipWidth(i),
params.MipHeight(i), params.MipDepth(i));
const u32 width = params.MipWidth(i);
const u32 height = params.MipHeight(i);
const u32 depth = params.MipDepth(i);
if (VideoCore::Surface::IsPixelFormatASTC(params.pixel_format)) {
// Reserve size for RGBA8 conversion
constexpr std::size_t rgba_bpp = 4;
gl_buffer[i].resize(std::max(gl_buffer[i].size(), width * height * depth * rgba_bpp));
}
Tegra::Texture::ConvertFromGuestToHost(gl_buffer[i].data(), params.pixel_format, width,
height, depth, true, true);
}
}
@@ -746,10 +678,8 @@ void CachedSurface::FlushGLBuffer() {
glGetTextureImage(texture.handle, 0, tuple.format, tuple.type,
static_cast<GLsizei>(gl_buffer[0].size()), gl_buffer[0].data());
glPixelStorei(GL_PACK_ROW_LENGTH, 0);
ConvertFormatAsNeeded_FlushGLBuffer(gl_buffer[0], params.pixel_format, params.width,
params.height);
const u8* const texture_src_data = Memory::GetPointer(params.addr);
ASSERT(texture_src_data);
Tegra::Texture::ConvertFromHostToGuest(gl_buffer[0].data(), params.pixel_format, params.width,
params.height, params.depth, true, true);
if (params.is_tiled) {
ASSERT_MSG(params.block_width == 1, "Block width is defined as {} on texture type {}",
params.block_width, static_cast<u32>(params.target));
@@ -759,9 +689,9 @@ void CachedSurface::FlushGLBuffer() {
const u32 bpp = params.GetFormatBpp() / 8;
const u32 copy_size = params.width * bpp;
if (params.pitch == copy_size) {
std::memcpy(Memory::GetPointer(params.addr), gl_buffer[0].data(), GetSizeInBytes());
std::memcpy(params.host_ptr, gl_buffer[0].data(), GetSizeInBytes());
} else {
u8* start = Memory::GetPointer(params.addr);
u8* start{params.host_ptr};
const u8* read_to = gl_buffer[0].data();
for (u32 h = params.height; h > 0; h--) {
std::memcpy(start, read_to, copy_size);
@@ -884,20 +814,22 @@ void CachedSurface::UploadGLMipmapTexture(u32 mip_map, GLuint read_fb_handle,
glPixelStorei(GL_UNPACK_ROW_LENGTH, 0);
}
void CachedSurface::EnsureTextureView() {
if (texture_view.handle != 0)
void CachedSurface::EnsureTextureDiscrepantView() {
if (discrepant_view.handle != 0)
return;
const GLenum target{TargetLayer()};
const GLenum target{GetArrayDiscrepantTarget(params.target)};
ASSERT(target != GL_NONE);
const GLuint num_layers{target == GL_TEXTURE_CUBE_MAP_ARRAY ? 6u : 1u};
constexpr GLuint min_layer = 0;
constexpr GLuint min_level = 0;
glGenTextures(1, &texture_view.handle);
glTextureView(texture_view.handle, target, texture.handle, gl_internal_format, min_level,
glGenTextures(1, &discrepant_view.handle);
glTextureView(discrepant_view.handle, target, texture.handle, gl_internal_format, min_level,
params.max_mip_level, min_layer, num_layers);
ApplyTextureDefaults(texture_view.handle, params.max_mip_level);
glTextureParameteriv(texture_view.handle, GL_TEXTURE_SWIZZLE_RGBA,
ApplyTextureDefaults(discrepant_view.handle, params.max_mip_level);
glTextureParameteriv(discrepant_view.handle, GL_TEXTURE_SWIZZLE_RGBA,
reinterpret_cast<const GLint*>(swizzle.data()));
}
@@ -923,8 +855,8 @@ void CachedSurface::UpdateSwizzle(Tegra::Texture::SwizzleSource swizzle_x,
swizzle = {new_x, new_y, new_z, new_w};
const auto swizzle_data = reinterpret_cast<const GLint*>(swizzle.data());
glTextureParameteriv(texture.handle, GL_TEXTURE_SWIZZLE_RGBA, swizzle_data);
if (texture_view.handle != 0) {
glTextureParameteriv(texture_view.handle, GL_TEXTURE_SWIZZLE_RGBA, swizzle_data);
if (discrepant_view.handle != 0) {
glTextureParameteriv(discrepant_view.handle, GL_TEXTURE_SWIZZLE_RGBA, swizzle_data);
}
}
@@ -998,7 +930,7 @@ Surface RasterizerCacheOpenGL::GetSurface(const SurfaceParams& params, bool pres
}
// Look up surface in the cache based on address
Surface surface{TryGet(params.addr)};
Surface surface{TryGet(params.host_ptr)};
if (surface) {
if (surface->GetSurfaceParams().IsCompatibleSurface(params)) {
// Use the cached surface as-is unless it's not synced with memory
@@ -1052,7 +984,7 @@ void RasterizerCacheOpenGL::FastLayeredCopySurface(const Surface& src_surface,
for (u32 layer = 0; layer < dst_params.depth; layer++) {
for (u32 mipmap = 0; mipmap < dst_params.max_mip_level; mipmap++) {
const VAddr sub_address = address + dst_params.GetMipmapLevelOffset(mipmap);
const Surface& copy = TryGet(sub_address);
const Surface& copy = TryGet(Memory::GetPointer(sub_address));
if (!copy)
continue;
const auto& src_params{copy->GetSurfaceParams()};
@@ -1229,7 +1161,8 @@ void RasterizerCacheOpenGL::AccurateCopySurface(const Surface& src_surface,
const auto& dst_params{dst_surface->GetSurfaceParams()};
// Flush enough memory for both the source and destination surface
FlushRegion(src_params.addr, std::max(src_params.MemorySize(), dst_params.MemorySize()));
FlushRegion(ToCacheAddr(src_params.host_ptr),
std::max(src_params.MemorySize(), dst_params.MemorySize()));
LoadSurface(dst_surface);
}
@@ -1281,8 +1214,8 @@ Surface RasterizerCacheOpenGL::RecreateSurface(const Surface& old_surface,
return new_surface;
}
Surface RasterizerCacheOpenGL::TryFindFramebufferSurface(VAddr addr) const {
return TryGet(addr);
Surface RasterizerCacheOpenGL::TryFindFramebufferSurface(const u8* host_ptr) const {
return TryGet(host_ptr);
}
void RasterizerCacheOpenGL::ReserveSurface(const Surface& surface) {
@@ -1333,7 +1266,7 @@ static bool LayerFitReinterpretSurface(RasterizerCacheOpenGL& cache, const Surfa
src_params.height == dst_params.MipHeight(*level) &&
src_params.block_height >= dst_params.MipBlockHeight(*level)) {
const std::optional<u32> slot =
TryFindBestLayer(render_surface->GetAddr(), dst_params, *level);
TryFindBestLayer(render_surface->GetCpuAddr(), dst_params, *level);
if (slot.has_value()) {
glCopyImageSubData(render_surface->Texture().handle,
SurfaceTargetToGL(src_params.target), 0, 0, 0, 0,
@@ -1349,8 +1282,8 @@ static bool LayerFitReinterpretSurface(RasterizerCacheOpenGL& cache, const Surfa
}
static bool IsReinterpretInvalid(const Surface render_surface, const Surface blitted_surface) {
const VAddr bound1 = blitted_surface->GetAddr() + blitted_surface->GetMemorySize();
const VAddr bound2 = render_surface->GetAddr() + render_surface->GetMemorySize();
const VAddr bound1 = blitted_surface->GetCpuAddr() + blitted_surface->GetMemorySize();
const VAddr bound2 = render_surface->GetCpuAddr() + render_surface->GetMemorySize();
if (bound2 > bound1)
return true;
const auto& dst_params = blitted_surface->GetSurfaceParams();
@@ -1393,7 +1326,8 @@ void RasterizerCacheOpenGL::SignalPreDrawCall() {
void RasterizerCacheOpenGL::SignalPostDrawCall() {
for (u32 i = 0; i < Maxwell::NumRenderTargets; i++) {
if (current_color_buffers[i] != nullptr) {
Surface intersect = CollideOnReinterpretedSurface(current_color_buffers[i]->GetAddr());
Surface intersect =
CollideOnReinterpretedSurface(current_color_buffers[i]->GetCacheAddr());
if (intersect != nullptr) {
PartialReinterpretSurface(current_color_buffers[i], intersect);
texception = true;

View File

@@ -297,6 +297,7 @@ struct SurfaceParams {
bool srgb_conversion;
// Parameters used for caching
VAddr addr;
u8* host_ptr;
Tegra::GPUVAddr gpu_addr;
std::size_t size_in_bytes;
std::size_t size_in_bytes_gl;
@@ -345,9 +346,9 @@ class RasterizerOpenGL;
class CachedSurface final : public RasterizerCacheObject {
public:
CachedSurface(const SurfaceParams& params);
explicit CachedSurface(const SurfaceParams& params);
VAddr GetAddr() const override {
VAddr GetCpuAddr() const override {
return params.addr;
}
@@ -367,31 +368,19 @@ public:
return texture;
}
const OGLTexture& TextureLayer() {
if (params.is_array) {
return Texture();
const OGLTexture& Texture(bool as_array) {
if (params.is_array == as_array) {
return texture;
} else {
EnsureTextureDiscrepantView();
return discrepant_view;
}
EnsureTextureView();
return texture_view;
}
GLenum Target() const {
return gl_target;
}
GLenum TargetLayer() const {
using VideoCore::Surface::SurfaceTarget;
switch (params.target) {
case SurfaceTarget::Texture1D:
return GL_TEXTURE_1D_ARRAY;
case SurfaceTarget::Texture2D:
return GL_TEXTURE_2D_ARRAY;
case SurfaceTarget::TextureCubemap:
return GL_TEXTURE_CUBE_MAP_ARRAY;
}
return Target();
}
const SurfaceParams& GetSurfaceParams() const {
return params;
}
@@ -431,10 +420,10 @@ public:
private:
void UploadGLMipmapTexture(u32 mip_map, GLuint read_fb_handle, GLuint draw_fb_handle);
void EnsureTextureView();
void EnsureTextureDiscrepantView();
OGLTexture texture;
OGLTexture texture_view;
OGLTexture discrepant_view;
std::vector<std::vector<u8>> gl_buffer;
SurfaceParams params{};
GLenum gl_target{};
@@ -461,7 +450,7 @@ public:
Surface GetColorBufferSurface(std::size_t index, bool preserve_contents);
/// Tries to find a framebuffer using on the provided CPU address
Surface TryFindFramebufferSurface(VAddr addr) const;
Surface TryFindFramebufferSurface(const u8* host_ptr) const;
/// Copies the contents of one surface to another
void FermiCopySurface(const Tegra::Engines::Fermi2D::Regs::Surface& src_config,
@@ -518,12 +507,12 @@ private:
std::array<Surface, Maxwell::NumRenderTargets> current_color_buffers;
Surface last_depth_buffer;
using SurfaceIntervalCache = boost::icl::interval_map<VAddr, Surface>;
using SurfaceIntervalCache = boost::icl::interval_map<CacheAddr, Surface>;
using SurfaceInterval = typename SurfaceIntervalCache::interval_type;
static auto GetReinterpretInterval(const Surface& object) {
return SurfaceInterval::right_open(object->GetAddr() + 1,
object->GetAddr() + object->GetMemorySize() - 1);
return SurfaceInterval::right_open(object->GetCacheAddr() + 1,
object->GetCacheAddr() + object->GetMemorySize() - 1);
}
// Reinterpreted surfaces are very fragil as the game may keep rendering into them.
@@ -535,7 +524,7 @@ private:
reinterpret_surface->MarkReinterpreted();
}
Surface CollideOnReinterpretedSurface(VAddr addr) const {
Surface CollideOnReinterpretedSurface(CacheAddr addr) const {
const SurfaceInterval interval{addr};
for (auto& pair :
boost::make_iterator_range(reinterpreted_surfaces.equal_range(interval))) {

View File

@@ -42,9 +42,9 @@ VAddr GetShaderAddress(Maxwell::ShaderProgram program) {
}
/// Gets the shader program code from memory for the specified address
ProgramCode GetShaderCode(VAddr addr) {
ProgramCode GetShaderCode(const u8* host_ptr) {
ProgramCode program_code(VideoCommon::Shader::MAX_PROGRAM_LENGTH);
Memory::ReadBlock(addr, program_code.data(), program_code.size() * sizeof(u64));
std::memcpy(program_code.data(), host_ptr, program_code.size() * sizeof(u64));
return program_code;
}
@@ -214,12 +214,13 @@ std::set<GLenum> GetSupportedFormats() {
} // namespace
CachedShader::CachedShader(VAddr addr, u64 unique_identifier, Maxwell::ShaderProgram program_type,
ShaderDiskCacheOpenGL& disk_cache,
CachedShader::CachedShader(VAddr guest_addr, u64 unique_identifier,
Maxwell::ShaderProgram program_type, ShaderDiskCacheOpenGL& disk_cache,
const PrecompiledPrograms& precompiled_programs,
ProgramCode&& program_code, ProgramCode&& program_code_b)
: addr{addr}, unique_identifier{unique_identifier}, program_type{program_type},
disk_cache{disk_cache}, precompiled_programs{precompiled_programs} {
ProgramCode&& program_code, ProgramCode&& program_code_b, u8* host_ptr)
: host_ptr{host_ptr}, guest_addr{guest_addr}, unique_identifier{unique_identifier},
program_type{program_type}, disk_cache{disk_cache},
precompiled_programs{precompiled_programs}, RasterizerCacheObject{host_ptr} {
const std::size_t code_size = CalculateProgramSize(program_code);
const std::size_t code_size_b =
@@ -243,12 +244,13 @@ CachedShader::CachedShader(VAddr addr, u64 unique_identifier, Maxwell::ShaderPro
disk_cache.SaveRaw(raw);
}
CachedShader::CachedShader(VAddr addr, u64 unique_identifier, Maxwell::ShaderProgram program_type,
ShaderDiskCacheOpenGL& disk_cache,
CachedShader::CachedShader(VAddr guest_addr, u64 unique_identifier,
Maxwell::ShaderProgram program_type, ShaderDiskCacheOpenGL& disk_cache,
const PrecompiledPrograms& precompiled_programs,
GLShader::ProgramResult result)
: addr{addr}, unique_identifier{unique_identifier}, program_type{program_type},
disk_cache{disk_cache}, precompiled_programs{precompiled_programs} {
GLShader::ProgramResult result, u8* host_ptr)
: guest_addr{guest_addr}, unique_identifier{unique_identifier}, program_type{program_type},
disk_cache{disk_cache}, precompiled_programs{precompiled_programs}, RasterizerCacheObject{
host_ptr} {
code = std::move(result.first);
entries = result.second;
@@ -271,7 +273,7 @@ std::tuple<GLuint, BaseBindings> CachedShader::GetProgramHandle(GLenum primitive
disk_cache.SaveUsage(GetUsage(primitive_mode, base_bindings));
}
LabelGLObject(GL_PROGRAM, program->handle, addr);
LabelGLObject(GL_PROGRAM, program->handle, guest_addr);
}
handle = program->handle;
@@ -323,7 +325,7 @@ GLuint CachedShader::LazyGeometryProgram(CachedProgram& target_program, BaseBind
disk_cache.SaveUsage(GetUsage(primitive_mode, base_bindings));
}
LabelGLObject(GL_PROGRAM, target_program->handle, addr, debug_name);
LabelGLObject(GL_PROGRAM, target_program->handle, guest_addr, debug_name);
return target_program->handle;
};
@@ -489,14 +491,17 @@ Shader ShaderCacheOpenGL::GetStageProgram(Maxwell::ShaderProgram program) {
const VAddr program_addr{GetShaderAddress(program)};
// Look up shader in the cache based on address
Shader shader{TryGet(program_addr)};
const auto& host_ptr{Memory::GetPointer(program_addr)};
Shader shader{TryGet(host_ptr)};
if (!shader) {
// No shader found - create a new one
ProgramCode program_code = GetShaderCode(program_addr);
const auto& host_ptr{Memory::GetPointer(program_addr)};
ProgramCode program_code{GetShaderCode(host_ptr)};
ProgramCode program_code_b;
if (program == Maxwell::ShaderProgram::VertexA) {
program_code_b = GetShaderCode(GetShaderAddress(Maxwell::ShaderProgram::VertexB));
program_code_b = GetShaderCode(
Memory::GetPointer(GetShaderAddress(Maxwell::ShaderProgram::VertexB)));
}
const u64 unique_identifier = GetUniqueIdentifier(program, program_code, program_code_b);
@@ -504,11 +509,11 @@ Shader ShaderCacheOpenGL::GetStageProgram(Maxwell::ShaderProgram program) {
if (found != precompiled_shaders.end()) {
shader =
std::make_shared<CachedShader>(program_addr, unique_identifier, program, disk_cache,
precompiled_programs, found->second);
precompiled_programs, found->second, host_ptr);
} else {
shader = std::make_shared<CachedShader>(
program_addr, unique_identifier, program, disk_cache, precompiled_programs,
std::move(program_code), std::move(program_code_b));
std::move(program_code), std::move(program_code_b), host_ptr);
}
Register(shader);
}

View File

@@ -39,18 +39,18 @@ using PrecompiledShaders = std::unordered_map<u64, GLShader::ProgramResult>;
class CachedShader final : public RasterizerCacheObject {
public:
explicit CachedShader(VAddr addr, u64 unique_identifier, Maxwell::ShaderProgram program_type,
ShaderDiskCacheOpenGL& disk_cache,
explicit CachedShader(VAddr guest_addr, u64 unique_identifier,
Maxwell::ShaderProgram program_type, ShaderDiskCacheOpenGL& disk_cache,
const PrecompiledPrograms& precompiled_programs,
ProgramCode&& program_code, ProgramCode&& program_code_b);
ProgramCode&& program_code, ProgramCode&& program_code_b, u8* host_ptr);
explicit CachedShader(VAddr addr, u64 unique_identifier, Maxwell::ShaderProgram program_type,
ShaderDiskCacheOpenGL& disk_cache,
explicit CachedShader(VAddr guest_addr, u64 unique_identifier,
Maxwell::ShaderProgram program_type, ShaderDiskCacheOpenGL& disk_cache,
const PrecompiledPrograms& precompiled_programs,
GLShader::ProgramResult result);
GLShader::ProgramResult result, u8* host_ptr);
VAddr GetAddr() const override {
return addr;
VAddr GetCpuAddr() const override {
return guest_addr;
}
std::size_t GetSizeInBytes() const override {
@@ -91,7 +91,8 @@ private:
ShaderDiskCacheUsage GetUsage(GLenum primitive_mode, BaseBindings base_bindings) const;
VAddr addr{};
u8* host_ptr{};
VAddr guest_addr{};
u64 unique_identifier{};
Maxwell::ShaderProgram program_type{};
ShaderDiskCacheOpenGL& disk_cache;

View File

@@ -5,7 +5,9 @@
#include <array>
#include <string>
#include <string_view>
#include <utility>
#include <variant>
#include <vector>
#include <fmt/format.h>
@@ -717,7 +719,7 @@ private:
}
std::string GenerateTexture(Operation operation, const std::string& func,
bool is_extra_int = false) {
const std::vector<std::pair<Type, Node>>& extras) {
constexpr std::array<const char*, 4> coord_constructors = {"float", "vec2", "vec3", "vec4"};
const auto meta = std::get_if<MetaTexture>(&operation.GetMeta());
@@ -738,36 +740,47 @@ private:
expr += Visit(operation[i]);
const std::size_t next = i + 1;
if (next < count || has_array || has_shadow)
if (next < count)
expr += ", ";
}
if (has_array) {
expr += "float(ftoi(" + Visit(meta->array) + "))";
expr += ", float(ftoi(" + Visit(meta->array) + "))";
}
if (has_shadow) {
if (has_array)
expr += ", ";
expr += Visit(meta->depth_compare);
expr += ", " + Visit(meta->depth_compare);
}
expr += ')';
for (const Node extra : meta->extras) {
for (const auto& extra_pair : extras) {
const auto [type, operand] = extra_pair;
if (operand == nullptr) {
continue;
}
expr += ", ";
if (is_extra_int) {
if (const auto immediate = std::get_if<ImmediateNode>(extra)) {
switch (type) {
case Type::Int:
if (const auto immediate = std::get_if<ImmediateNode>(operand)) {
// Inline the string as an immediate integer in GLSL (some extra arguments are
// required to be constant)
expr += std::to_string(static_cast<s32>(immediate->GetValue()));
} else {
expr += "ftoi(" + Visit(extra) + ')';
expr += "ftoi(" + Visit(operand) + ')';
}
} else {
expr += Visit(extra);
break;
case Type::Float:
expr += Visit(operand);
break;
default: {
const auto type_int = static_cast<u32>(type);
UNIMPLEMENTED_MSG("Unimplemented extra type={}", type_int);
expr += '0';
break;
}
}
}
expr += ')';
return expr;
return expr + ')';
}
std::string Assign(Operation operation) {
@@ -1146,7 +1159,7 @@ private:
const auto meta = std::get_if<MetaTexture>(&operation.GetMeta());
ASSERT(meta);
std::string expr = GenerateTexture(operation, "texture");
std::string expr = GenerateTexture(operation, "texture", {{Type::Float, meta->bias}});
if (meta->sampler.IsShadow()) {
expr = "vec4(" + expr + ')';
}
@@ -1157,7 +1170,7 @@ private:
const auto meta = std::get_if<MetaTexture>(&operation.GetMeta());
ASSERT(meta);
std::string expr = GenerateTexture(operation, "textureLod");
std::string expr = GenerateTexture(operation, "textureLod", {{Type::Float, meta->lod}});
if (meta->sampler.IsShadow()) {
expr = "vec4(" + expr + ')';
}
@@ -1168,7 +1181,8 @@ private:
const auto meta = std::get_if<MetaTexture>(&operation.GetMeta());
ASSERT(meta);
return GenerateTexture(operation, "textureGather", !meta->sampler.IsShadow()) +
const auto type = meta->sampler.IsShadow() ? Type::Float : Type::Int;
return GenerateTexture(operation, "textureGather", {{type, meta->component}}) +
GetSwizzle(meta->element);
}
@@ -1197,8 +1211,8 @@ private:
ASSERT(meta);
if (meta->element < 2) {
return "itof(int((" + GenerateTexture(operation, "textureQueryLod") + " * vec2(256))" +
GetSwizzle(meta->element) + "))";
return "itof(int((" + GenerateTexture(operation, "textureQueryLod", {}) +
" * vec2(256))" + GetSwizzle(meta->element) + "))";
}
return "0";
}
@@ -1224,9 +1238,9 @@ private:
else if (next < count)
expr += ", ";
}
for (std::size_t i = 0; i < meta->extras.size(); ++i) {
if (meta->lod) {
expr += ", ";
expr += CastOperand(Visit(meta->extras.at(i)), Type::Int);
expr += CastOperand(Visit(meta->lod), Type::Int);
}
expr += ')';

View File

@@ -164,12 +164,13 @@ void RendererOpenGL::LoadFBToScreenInfo(const Tegra::FramebufferConfig& framebuf
// Reset the screen info's display texture to its own permanent texture
screen_info.display_texture = screen_info.texture.resource.handle;
Memory::RasterizerFlushVirtualRegion(framebuffer_addr, size_in_bytes,
Memory::FlushMode::Flush);
rasterizer->FlushRegion(ToCacheAddr(Memory::GetPointer(framebuffer_addr)), size_in_bytes);
VideoCore::MortonCopyPixels128(framebuffer.width, framebuffer.height, bytes_per_pixel, 4,
Memory::GetPointer(framebuffer_addr),
gl_framebuffer_data.data(), true);
constexpr u32 linear_bpp = 4;
VideoCore::MortonCopyPixels128(VideoCore::MortonSwizzleMode::MortonToLinear,
framebuffer.width, framebuffer.height, bytes_per_pixel,
linear_bpp, Memory::GetPointer(framebuffer_addr),
gl_framebuffer_data.data());
glPixelStorei(GL_UNPACK_ROW_LENGTH, static_cast<GLint>(framebuffer.stride));
@@ -244,6 +245,21 @@ void RendererOpenGL::InitOpenGLObjects() {
LoadColorToActiveGLTexture(0, 0, 0, 0, screen_info.texture);
}
void RendererOpenGL::AddTelemetryFields() {
const char* const gl_version{reinterpret_cast<char const*>(glGetString(GL_VERSION))};
const char* const gpu_vendor{reinterpret_cast<char const*>(glGetString(GL_VENDOR))};
const char* const gpu_model{reinterpret_cast<char const*>(glGetString(GL_RENDERER))};
LOG_INFO(Render_OpenGL, "GL_VERSION: {}", gl_version);
LOG_INFO(Render_OpenGL, "GL_VENDOR: {}", gpu_vendor);
LOG_INFO(Render_OpenGL, "GL_RENDERER: {}", gpu_model);
auto& telemetry_session = system.TelemetrySession();
telemetry_session.AddField(Telemetry::FieldType::UserSystem, "GPU_Vendor", gpu_vendor);
telemetry_session.AddField(Telemetry::FieldType::UserSystem, "GPU_Model", gpu_model);
telemetry_session.AddField(Telemetry::FieldType::UserSystem, "GPU_OpenGL_Version", gl_version);
}
void RendererOpenGL::CreateRasterizer() {
if (rasterizer) {
return;
@@ -466,17 +482,7 @@ bool RendererOpenGL::Init() {
glDebugMessageCallback(DebugHandler, nullptr);
}
const char* gl_version{reinterpret_cast<char const*>(glGetString(GL_VERSION))};
const char* gpu_vendor{reinterpret_cast<char const*>(glGetString(GL_VENDOR))};
const char* gpu_model{reinterpret_cast<char const*>(glGetString(GL_RENDERER))};
LOG_INFO(Render_OpenGL, "GL_VERSION: {}", gl_version);
LOG_INFO(Render_OpenGL, "GL_VENDOR: {}", gpu_vendor);
LOG_INFO(Render_OpenGL, "GL_RENDERER: {}", gpu_model);
Core::Telemetry().AddField(Telemetry::FieldType::UserSystem, "GPU_Vendor", gpu_vendor);
Core::Telemetry().AddField(Telemetry::FieldType::UserSystem, "GPU_Model", gpu_model);
Core::Telemetry().AddField(Telemetry::FieldType::UserSystem, "GPU_OpenGL_Version", gl_version);
AddTelemetryFields();
if (!GLAD_GL_VERSION_4_3) {
return false;

View File

@@ -60,6 +60,7 @@ public:
private:
void InitOpenGLObjects();
void AddTelemetryFields();
void CreateRasterizer();
void ConfigureFramebufferTexture(TextureInfo& texture,

View File

@@ -0,0 +1,483 @@
// Copyright 2019 yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#include "common/assert.h"
#include "common/common_types.h"
#include "common/logging/log.h"
#include "video_core/engines/maxwell_3d.h"
#include "video_core/renderer_vulkan/declarations.h"
#include "video_core/renderer_vulkan/maxwell_to_vk.h"
#include "video_core/renderer_vulkan/vk_device.h"
#include "video_core/surface.h"
namespace Vulkan::MaxwellToVK {
namespace Sampler {
vk::Filter Filter(Tegra::Texture::TextureFilter filter) {
switch (filter) {
case Tegra::Texture::TextureFilter::Linear:
return vk::Filter::eLinear;
case Tegra::Texture::TextureFilter::Nearest:
return vk::Filter::eNearest;
}
UNIMPLEMENTED_MSG("Unimplemented sampler filter={}", static_cast<u32>(filter));
return {};
}
vk::SamplerMipmapMode MipmapMode(Tegra::Texture::TextureMipmapFilter mipmap_filter) {
switch (mipmap_filter) {
case Tegra::Texture::TextureMipmapFilter::None:
// TODO(Rodrigo): None seems to be mapped to OpenGL's mag and min filters without mipmapping
// (e.g. GL_NEAREST and GL_LINEAR). Vulkan doesn't have such a thing, find out if we have to
// use an image view with a single mipmap level to emulate this.
return vk::SamplerMipmapMode::eLinear;
case Tegra::Texture::TextureMipmapFilter::Linear:
return vk::SamplerMipmapMode::eLinear;
case Tegra::Texture::TextureMipmapFilter::Nearest:
return vk::SamplerMipmapMode::eNearest;
}
UNIMPLEMENTED_MSG("Unimplemented sampler mipmap mode={}", static_cast<u32>(mipmap_filter));
return {};
}
vk::SamplerAddressMode WrapMode(Tegra::Texture::WrapMode wrap_mode) {
switch (wrap_mode) {
case Tegra::Texture::WrapMode::Wrap:
return vk::SamplerAddressMode::eRepeat;
case Tegra::Texture::WrapMode::Mirror:
return vk::SamplerAddressMode::eMirroredRepeat;
case Tegra::Texture::WrapMode::ClampToEdge:
return vk::SamplerAddressMode::eClampToEdge;
case Tegra::Texture::WrapMode::Border:
return vk::SamplerAddressMode::eClampToBorder;
case Tegra::Texture::WrapMode::ClampOGL:
// TODO(Rodrigo): GL_CLAMP was removed as of OpenGL 3.1, to implement GL_CLAMP, we can use
// eClampToBorder to get the border color of the texture, and then sample the edge to
// manually mix them. However the shader part of this is not yet implemented.
return vk::SamplerAddressMode::eClampToBorder;
case Tegra::Texture::WrapMode::MirrorOnceClampToEdge:
return vk::SamplerAddressMode::eMirrorClampToEdge;
case Tegra::Texture::WrapMode::MirrorOnceBorder:
UNIMPLEMENTED();
return vk::SamplerAddressMode::eMirrorClampToEdge;
}
UNIMPLEMENTED_MSG("Unimplemented wrap mode={}", static_cast<u32>(wrap_mode));
return {};
}
vk::CompareOp DepthCompareFunction(Tegra::Texture::DepthCompareFunc depth_compare_func) {
switch (depth_compare_func) {
case Tegra::Texture::DepthCompareFunc::Never:
return vk::CompareOp::eNever;
case Tegra::Texture::DepthCompareFunc::Less:
return vk::CompareOp::eLess;
case Tegra::Texture::DepthCompareFunc::LessEqual:
return vk::CompareOp::eLessOrEqual;
case Tegra::Texture::DepthCompareFunc::Equal:
return vk::CompareOp::eEqual;
case Tegra::Texture::DepthCompareFunc::NotEqual:
return vk::CompareOp::eNotEqual;
case Tegra::Texture::DepthCompareFunc::Greater:
return vk::CompareOp::eGreater;
case Tegra::Texture::DepthCompareFunc::GreaterEqual:
return vk::CompareOp::eGreaterOrEqual;
case Tegra::Texture::DepthCompareFunc::Always:
return vk::CompareOp::eAlways;
}
UNIMPLEMENTED_MSG("Unimplemented sampler depth compare function={}",
static_cast<u32>(depth_compare_func));
return {};
}
} // namespace Sampler
struct FormatTuple {
vk::Format format; ///< Vulkan format
ComponentType component_type; ///< Abstracted component type
bool attachable; ///< True when this format can be used as an attachment
};
static constexpr std::array<FormatTuple, VideoCore::Surface::MaxPixelFormat> tex_format_tuples = {{
{vk::Format::eA8B8G8R8UnormPack32, ComponentType::UNorm, true}, // ABGR8U
{vk::Format::eUndefined, ComponentType::Invalid, false}, // ABGR8S
{vk::Format::eUndefined, ComponentType::Invalid, false}, // ABGR8UI
{vk::Format::eB5G6R5UnormPack16, ComponentType::UNorm, false}, // B5G6R5U
{vk::Format::eA2B10G10R10UnormPack32, ComponentType::UNorm, true}, // A2B10G10R10U
{vk::Format::eUndefined, ComponentType::Invalid, false}, // A1B5G5R5U
{vk::Format::eR8Unorm, ComponentType::UNorm, true}, // R8U
{vk::Format::eUndefined, ComponentType::Invalid, false}, // R8UI
{vk::Format::eUndefined, ComponentType::Invalid, false}, // RGBA16F
{vk::Format::eUndefined, ComponentType::Invalid, false}, // RGBA16U
{vk::Format::eUndefined, ComponentType::Invalid, false}, // RGBA16UI
{vk::Format::eUndefined, ComponentType::Invalid, false}, // R11FG11FB10F
{vk::Format::eUndefined, ComponentType::Invalid, false}, // RGBA32UI
{vk::Format::eBc1RgbaUnormBlock, ComponentType::UNorm, false}, // DXT1
{vk::Format::eBc2UnormBlock, ComponentType::UNorm, false}, // DXT23
{vk::Format::eBc3UnormBlock, ComponentType::UNorm, false}, // DXT45
{vk::Format::eBc4UnormBlock, ComponentType::UNorm, false}, // DXN1
{vk::Format::eUndefined, ComponentType::Invalid, false}, // DXN2UNORM
{vk::Format::eUndefined, ComponentType::Invalid, false}, // DXN2SNORM
{vk::Format::eUndefined, ComponentType::Invalid, false}, // BC7U
{vk::Format::eUndefined, ComponentType::Invalid, false}, // BC6H_UF16
{vk::Format::eUndefined, ComponentType::Invalid, false}, // BC6H_SF16
{vk::Format::eUndefined, ComponentType::Invalid, false}, // ASTC_2D_4X4
{vk::Format::eUndefined, ComponentType::Invalid, false}, // BGRA8
{vk::Format::eUndefined, ComponentType::Invalid, false}, // RGBA32F
{vk::Format::eUndefined, ComponentType::Invalid, false}, // RG32F
{vk::Format::eUndefined, ComponentType::Invalid, false}, // R32F
{vk::Format::eUndefined, ComponentType::Invalid, false}, // R16F
{vk::Format::eUndefined, ComponentType::Invalid, false}, // R16U
{vk::Format::eUndefined, ComponentType::Invalid, false}, // R16S
{vk::Format::eUndefined, ComponentType::Invalid, false}, // R16UI
{vk::Format::eUndefined, ComponentType::Invalid, false}, // R16I
{vk::Format::eUndefined, ComponentType::Invalid, false}, // RG16
{vk::Format::eUndefined, ComponentType::Invalid, false}, // RG16F
{vk::Format::eUndefined, ComponentType::Invalid, false}, // RG16UI
{vk::Format::eUndefined, ComponentType::Invalid, false}, // RG16I
{vk::Format::eUndefined, ComponentType::Invalid, false}, // RG16S
{vk::Format::eUndefined, ComponentType::Invalid, false}, // RGB32F
{vk::Format::eA8B8G8R8SrgbPack32, ComponentType::UNorm, true}, // RGBA8_SRGB
{vk::Format::eUndefined, ComponentType::Invalid, false}, // RG8U
{vk::Format::eUndefined, ComponentType::Invalid, false}, // RG8S
{vk::Format::eUndefined, ComponentType::Invalid, false}, // RG32UI
{vk::Format::eUndefined, ComponentType::Invalid, false}, // R32UI
{vk::Format::eUndefined, ComponentType::Invalid, false}, // ASTC_2D_8X8
{vk::Format::eUndefined, ComponentType::Invalid, false}, // ASTC_2D_8X5
{vk::Format::eUndefined, ComponentType::Invalid, false}, // ASTC_2D_5X4
// Compressed sRGB formats
{vk::Format::eUndefined, ComponentType::Invalid, false}, // BGRA8_SRGB
{vk::Format::eUndefined, ComponentType::Invalid, false}, // DXT1_SRGB
{vk::Format::eUndefined, ComponentType::Invalid, false}, // DXT23_SRGB
{vk::Format::eUndefined, ComponentType::Invalid, false}, // DXT45_SRGB
{vk::Format::eUndefined, ComponentType::Invalid, false}, // BC7U_SRGB
{vk::Format::eUndefined, ComponentType::Invalid, false}, // ASTC_2D_4X4_SRGB
{vk::Format::eUndefined, ComponentType::Invalid, false}, // ASTC_2D_8X8_SRGB
{vk::Format::eUndefined, ComponentType::Invalid, false}, // ASTC_2D_8X5_SRGB
{vk::Format::eUndefined, ComponentType::Invalid, false}, // ASTC_2D_5X4_SRGB
{vk::Format::eUndefined, ComponentType::Invalid, false}, // ASTC_2D_5X5
{vk::Format::eUndefined, ComponentType::Invalid, false}, // ASTC_2D_5X5_SRGB
{vk::Format::eUndefined, ComponentType::Invalid, false}, // ASTC_2D_10X8
{vk::Format::eUndefined, ComponentType::Invalid, false}, // ASTC_2D_10X8_SRGB
// Depth formats
{vk::Format::eD32Sfloat, ComponentType::Float, true}, // Z32F
{vk::Format::eD16Unorm, ComponentType::UNorm, true}, // Z16
// DepthStencil formats
{vk::Format::eD24UnormS8Uint, ComponentType::UNorm, true}, // Z24S8
{vk::Format::eD24UnormS8Uint, ComponentType::UNorm, true}, // S8Z24 (emulated)
{vk::Format::eUndefined, ComponentType::Invalid, false}, // Z32FS8
}};
static constexpr bool IsZetaFormat(PixelFormat pixel_format) {
return pixel_format >= PixelFormat::MaxColorFormat &&
pixel_format < PixelFormat::MaxDepthStencilFormat;
}
std::pair<vk::Format, bool> SurfaceFormat(const VKDevice& device, FormatType format_type,
PixelFormat pixel_format, ComponentType component_type) {
ASSERT(static_cast<std::size_t>(pixel_format) < tex_format_tuples.size());
const auto tuple = tex_format_tuples[static_cast<u32>(pixel_format)];
UNIMPLEMENTED_IF_MSG(tuple.format == vk::Format::eUndefined,
"Unimplemented texture format with pixel format={} and component type={}",
static_cast<u32>(pixel_format), static_cast<u32>(component_type));
ASSERT_MSG(component_type == tuple.component_type, "Component type mismatch");
auto usage = vk::FormatFeatureFlagBits::eSampledImage |
vk::FormatFeatureFlagBits::eTransferDst | vk::FormatFeatureFlagBits::eTransferSrc;
if (tuple.attachable) {
usage |= IsZetaFormat(pixel_format) ? vk::FormatFeatureFlagBits::eDepthStencilAttachment
: vk::FormatFeatureFlagBits::eColorAttachment;
}
return {device.GetSupportedFormat(tuple.format, usage, format_type), tuple.attachable};
}
vk::ShaderStageFlagBits ShaderStage(Maxwell::ShaderStage stage) {
switch (stage) {
case Maxwell::ShaderStage::Vertex:
return vk::ShaderStageFlagBits::eVertex;
case Maxwell::ShaderStage::TesselationControl:
return vk::ShaderStageFlagBits::eTessellationControl;
case Maxwell::ShaderStage::TesselationEval:
return vk::ShaderStageFlagBits::eTessellationEvaluation;
case Maxwell::ShaderStage::Geometry:
return vk::ShaderStageFlagBits::eGeometry;
case Maxwell::ShaderStage::Fragment:
return vk::ShaderStageFlagBits::eFragment;
}
UNIMPLEMENTED_MSG("Unimplemented shader stage={}", static_cast<u32>(stage));
return {};
}
vk::PrimitiveTopology PrimitiveTopology(Maxwell::PrimitiveTopology topology) {
switch (topology) {
case Maxwell::PrimitiveTopology::Points:
return vk::PrimitiveTopology::ePointList;
case Maxwell::PrimitiveTopology::Lines:
return vk::PrimitiveTopology::eLineList;
case Maxwell::PrimitiveTopology::LineStrip:
return vk::PrimitiveTopology::eLineStrip;
case Maxwell::PrimitiveTopology::Triangles:
return vk::PrimitiveTopology::eTriangleList;
case Maxwell::PrimitiveTopology::TriangleStrip:
return vk::PrimitiveTopology::eTriangleStrip;
}
UNIMPLEMENTED_MSG("Unimplemented topology={}", static_cast<u32>(topology));
return {};
}
vk::Format VertexFormat(Maxwell::VertexAttribute::Type type, Maxwell::VertexAttribute::Size size) {
switch (type) {
case Maxwell::VertexAttribute::Type::SignedNorm:
break;
case Maxwell::VertexAttribute::Type::UnsignedNorm:
switch (size) {
case Maxwell::VertexAttribute::Size::Size_8_8_8_8:
return vk::Format::eR8G8B8A8Unorm;
default:
break;
}
break;
case Maxwell::VertexAttribute::Type::SignedInt:
break;
case Maxwell::VertexAttribute::Type::UnsignedInt:
switch (size) {
case Maxwell::VertexAttribute::Size::Size_32:
return vk::Format::eR32Uint;
default:
break;
}
case Maxwell::VertexAttribute::Type::UnsignedScaled:
case Maxwell::VertexAttribute::Type::SignedScaled:
break;
case Maxwell::VertexAttribute::Type::Float:
switch (size) {
case Maxwell::VertexAttribute::Size::Size_32_32_32_32:
return vk::Format::eR32G32B32A32Sfloat;
case Maxwell::VertexAttribute::Size::Size_32_32_32:
return vk::Format::eR32G32B32Sfloat;
case Maxwell::VertexAttribute::Size::Size_32_32:
return vk::Format::eR32G32Sfloat;
case Maxwell::VertexAttribute::Size::Size_32:
return vk::Format::eR32Sfloat;
default:
break;
}
break;
}
UNIMPLEMENTED_MSG("Unimplemented vertex format of type={} and size={}", static_cast<u32>(type),
static_cast<u32>(size));
return {};
}
vk::CompareOp ComparisonOp(Maxwell::ComparisonOp comparison) {
switch (comparison) {
case Maxwell::ComparisonOp::Never:
case Maxwell::ComparisonOp::NeverOld:
return vk::CompareOp::eNever;
case Maxwell::ComparisonOp::Less:
case Maxwell::ComparisonOp::LessOld:
return vk::CompareOp::eLess;
case Maxwell::ComparisonOp::Equal:
case Maxwell::ComparisonOp::EqualOld:
return vk::CompareOp::eEqual;
case Maxwell::ComparisonOp::LessEqual:
case Maxwell::ComparisonOp::LessEqualOld:
return vk::CompareOp::eLessOrEqual;
case Maxwell::ComparisonOp::Greater:
case Maxwell::ComparisonOp::GreaterOld:
return vk::CompareOp::eGreater;
case Maxwell::ComparisonOp::NotEqual:
case Maxwell::ComparisonOp::NotEqualOld:
return vk::CompareOp::eNotEqual;
case Maxwell::ComparisonOp::GreaterEqual:
case Maxwell::ComparisonOp::GreaterEqualOld:
return vk::CompareOp::eGreaterOrEqual;
case Maxwell::ComparisonOp::Always:
case Maxwell::ComparisonOp::AlwaysOld:
return vk::CompareOp::eAlways;
}
UNIMPLEMENTED_MSG("Unimplemented comparison op={}", static_cast<u32>(comparison));
return {};
}
vk::IndexType IndexFormat(Maxwell::IndexFormat index_format) {
switch (index_format) {
case Maxwell::IndexFormat::UnsignedByte:
UNIMPLEMENTED_MSG("Vulkan does not support native u8 index format");
return vk::IndexType::eUint16;
case Maxwell::IndexFormat::UnsignedShort:
return vk::IndexType::eUint16;
case Maxwell::IndexFormat::UnsignedInt:
return vk::IndexType::eUint32;
}
UNIMPLEMENTED_MSG("Unimplemented index_format={}", static_cast<u32>(index_format));
return {};
}
vk::StencilOp StencilOp(Maxwell::StencilOp stencil_op) {
switch (stencil_op) {
case Maxwell::StencilOp::Keep:
case Maxwell::StencilOp::KeepOGL:
return vk::StencilOp::eKeep;
case Maxwell::StencilOp::Zero:
case Maxwell::StencilOp::ZeroOGL:
return vk::StencilOp::eZero;
case Maxwell::StencilOp::Replace:
case Maxwell::StencilOp::ReplaceOGL:
return vk::StencilOp::eReplace;
case Maxwell::StencilOp::Incr:
case Maxwell::StencilOp::IncrOGL:
return vk::StencilOp::eIncrementAndClamp;
case Maxwell::StencilOp::Decr:
case Maxwell::StencilOp::DecrOGL:
return vk::StencilOp::eDecrementAndClamp;
case Maxwell::StencilOp::Invert:
case Maxwell::StencilOp::InvertOGL:
return vk::StencilOp::eInvert;
case Maxwell::StencilOp::IncrWrap:
case Maxwell::StencilOp::IncrWrapOGL:
return vk::StencilOp::eIncrementAndWrap;
case Maxwell::StencilOp::DecrWrap:
case Maxwell::StencilOp::DecrWrapOGL:
return vk::StencilOp::eDecrementAndWrap;
}
UNIMPLEMENTED_MSG("Unimplemented stencil op={}", static_cast<u32>(stencil_op));
return {};
}
vk::BlendOp BlendEquation(Maxwell::Blend::Equation equation) {
switch (equation) {
case Maxwell::Blend::Equation::Add:
case Maxwell::Blend::Equation::AddGL:
return vk::BlendOp::eAdd;
case Maxwell::Blend::Equation::Subtract:
case Maxwell::Blend::Equation::SubtractGL:
return vk::BlendOp::eSubtract;
case Maxwell::Blend::Equation::ReverseSubtract:
case Maxwell::Blend::Equation::ReverseSubtractGL:
return vk::BlendOp::eReverseSubtract;
case Maxwell::Blend::Equation::Min:
case Maxwell::Blend::Equation::MinGL:
return vk::BlendOp::eMin;
case Maxwell::Blend::Equation::Max:
case Maxwell::Blend::Equation::MaxGL:
return vk::BlendOp::eMax;
}
UNIMPLEMENTED_MSG("Unimplemented blend equation={}", static_cast<u32>(equation));
return {};
}
vk::BlendFactor BlendFactor(Maxwell::Blend::Factor factor) {
switch (factor) {
case Maxwell::Blend::Factor::Zero:
case Maxwell::Blend::Factor::ZeroGL:
return vk::BlendFactor::eZero;
case Maxwell::Blend::Factor::One:
case Maxwell::Blend::Factor::OneGL:
return vk::BlendFactor::eOne;
case Maxwell::Blend::Factor::SourceColor:
case Maxwell::Blend::Factor::SourceColorGL:
return vk::BlendFactor::eSrcColor;
case Maxwell::Blend::Factor::OneMinusSourceColor:
case Maxwell::Blend::Factor::OneMinusSourceColorGL:
return vk::BlendFactor::eOneMinusSrcColor;
case Maxwell::Blend::Factor::SourceAlpha:
case Maxwell::Blend::Factor::SourceAlphaGL:
return vk::BlendFactor::eSrcAlpha;
case Maxwell::Blend::Factor::OneMinusSourceAlpha:
case Maxwell::Blend::Factor::OneMinusSourceAlphaGL:
return vk::BlendFactor::eOneMinusSrcAlpha;
case Maxwell::Blend::Factor::DestAlpha:
case Maxwell::Blend::Factor::DestAlphaGL:
return vk::BlendFactor::eDstAlpha;
case Maxwell::Blend::Factor::OneMinusDestAlpha:
case Maxwell::Blend::Factor::OneMinusDestAlphaGL:
return vk::BlendFactor::eOneMinusDstAlpha;
case Maxwell::Blend::Factor::DestColor:
case Maxwell::Blend::Factor::DestColorGL:
return vk::BlendFactor::eDstColor;
case Maxwell::Blend::Factor::OneMinusDestColor:
case Maxwell::Blend::Factor::OneMinusDestColorGL:
return vk::BlendFactor::eOneMinusDstColor;
case Maxwell::Blend::Factor::SourceAlphaSaturate:
case Maxwell::Blend::Factor::SourceAlphaSaturateGL:
return vk::BlendFactor::eSrcAlphaSaturate;
case Maxwell::Blend::Factor::Source1Color:
case Maxwell::Blend::Factor::Source1ColorGL:
return vk::BlendFactor::eSrc1Color;
case Maxwell::Blend::Factor::OneMinusSource1Color:
case Maxwell::Blend::Factor::OneMinusSource1ColorGL:
return vk::BlendFactor::eOneMinusSrc1Color;
case Maxwell::Blend::Factor::Source1Alpha:
case Maxwell::Blend::Factor::Source1AlphaGL:
return vk::BlendFactor::eSrc1Alpha;
case Maxwell::Blend::Factor::OneMinusSource1Alpha:
case Maxwell::Blend::Factor::OneMinusSource1AlphaGL:
return vk::BlendFactor::eOneMinusSrc1Alpha;
case Maxwell::Blend::Factor::ConstantColor:
case Maxwell::Blend::Factor::ConstantColorGL:
return vk::BlendFactor::eConstantColor;
case Maxwell::Blend::Factor::OneMinusConstantColor:
case Maxwell::Blend::Factor::OneMinusConstantColorGL:
return vk::BlendFactor::eOneMinusConstantColor;
case Maxwell::Blend::Factor::ConstantAlpha:
case Maxwell::Blend::Factor::ConstantAlphaGL:
return vk::BlendFactor::eConstantAlpha;
case Maxwell::Blend::Factor::OneMinusConstantAlpha:
case Maxwell::Blend::Factor::OneMinusConstantAlphaGL:
return vk::BlendFactor::eOneMinusConstantAlpha;
}
UNIMPLEMENTED_MSG("Unimplemented blend factor={}", static_cast<u32>(factor));
return {};
}
vk::FrontFace FrontFace(Maxwell::Cull::FrontFace front_face) {
switch (front_face) {
case Maxwell::Cull::FrontFace::ClockWise:
return vk::FrontFace::eClockwise;
case Maxwell::Cull::FrontFace::CounterClockWise:
return vk::FrontFace::eCounterClockwise;
}
UNIMPLEMENTED_MSG("Unimplemented front face={}", static_cast<u32>(front_face));
return {};
}
vk::CullModeFlags CullFace(Maxwell::Cull::CullFace cull_face) {
switch (cull_face) {
case Maxwell::Cull::CullFace::Front:
return vk::CullModeFlagBits::eFront;
case Maxwell::Cull::CullFace::Back:
return vk::CullModeFlagBits::eBack;
case Maxwell::Cull::CullFace::FrontAndBack:
return vk::CullModeFlagBits::eFrontAndBack;
}
UNIMPLEMENTED_MSG("Unimplemented cull face={}", static_cast<u32>(cull_face));
return {};
}
vk::ComponentSwizzle SwizzleSource(Tegra::Texture::SwizzleSource swizzle) {
switch (swizzle) {
case Tegra::Texture::SwizzleSource::Zero:
return vk::ComponentSwizzle::eZero;
case Tegra::Texture::SwizzleSource::R:
return vk::ComponentSwizzle::eR;
case Tegra::Texture::SwizzleSource::G:
return vk::ComponentSwizzle::eG;
case Tegra::Texture::SwizzleSource::B:
return vk::ComponentSwizzle::eB;
case Tegra::Texture::SwizzleSource::A:
return vk::ComponentSwizzle::eA;
case Tegra::Texture::SwizzleSource::OneInt:
case Tegra::Texture::SwizzleSource::OneFloat:
return vk::ComponentSwizzle::eOne;
}
UNIMPLEMENTED_MSG("Unimplemented swizzle source={}", static_cast<u32>(swizzle));
return {};
}
} // namespace Vulkan::MaxwellToVK

View File

@@ -0,0 +1,58 @@
// Copyright 2019 yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
#include <utility>
#include "common/common_types.h"
#include "video_core/engines/maxwell_3d.h"
#include "video_core/renderer_vulkan/declarations.h"
#include "video_core/renderer_vulkan/vk_device.h"
#include "video_core/surface.h"
#include "video_core/textures/texture.h"
namespace Vulkan::MaxwellToVK {
using Maxwell = Tegra::Engines::Maxwell3D::Regs;
using PixelFormat = VideoCore::Surface::PixelFormat;
using ComponentType = VideoCore::Surface::ComponentType;
namespace Sampler {
vk::Filter Filter(Tegra::Texture::TextureFilter filter);
vk::SamplerMipmapMode MipmapMode(Tegra::Texture::TextureMipmapFilter mipmap_filter);
vk::SamplerAddressMode WrapMode(Tegra::Texture::WrapMode wrap_mode);
vk::CompareOp DepthCompareFunction(Tegra::Texture::DepthCompareFunc depth_compare_func);
} // namespace Sampler
std::pair<vk::Format, bool> SurfaceFormat(const VKDevice& device, FormatType format_type,
PixelFormat pixel_format, ComponentType component_type);
vk::ShaderStageFlagBits ShaderStage(Maxwell::ShaderStage stage);
vk::PrimitiveTopology PrimitiveTopology(Maxwell::PrimitiveTopology topology);
vk::Format VertexFormat(Maxwell::VertexAttribute::Type type, Maxwell::VertexAttribute::Size size);
vk::CompareOp ComparisonOp(Maxwell::ComparisonOp comparison);
vk::IndexType IndexFormat(Maxwell::IndexFormat index_format);
vk::StencilOp StencilOp(Maxwell::StencilOp stencil_op);
vk::BlendOp BlendEquation(Maxwell::Blend::Equation equation);
vk::BlendFactor BlendFactor(Maxwell::Blend::Factor factor);
vk::FrontFace FrontFace(Maxwell::Cull::FrontFace front_face);
vk::CullModeFlags CullFace(Maxwell::Cull::CullFace cull_face);
vk::ComponentSwizzle SwizzleSource(Tegra::Texture::SwizzleSource swizzle);
} // namespace Vulkan::MaxwellToVK

View File

@@ -17,6 +17,11 @@
namespace Vulkan {
CachedBufferEntry::CachedBufferEntry(VAddr cpu_addr, std::size_t size, u64 offset,
std::size_t alignment, u8* host_ptr)
: cpu_addr{cpu_addr}, size{size}, offset{offset}, alignment{alignment}, RasterizerCacheObject{
host_ptr} {}
VKBufferCache::VKBufferCache(Tegra::MemoryManager& tegra_memory_manager,
VideoCore::RasterizerInterface& rasterizer, const VKDevice& device,
VKMemoryManager& memory_manager, VKScheduler& scheduler, u64 size)
@@ -37,16 +42,18 @@ VKBufferCache::~VKBufferCache() = default;
u64 VKBufferCache::UploadMemory(Tegra::GPUVAddr gpu_addr, std::size_t size, u64 alignment,
bool cache) {
const auto cpu_addr{tegra_memory_manager.GpuToCpuAddress(gpu_addr)};
ASSERT(cpu_addr);
ASSERT_MSG(cpu_addr, "Invalid GPU address");
// Cache management is a big overhead, so only cache entries with a given size.
// TODO: Figure out which size is the best for given games.
cache &= size >= 2048;
const auto& host_ptr{Memory::GetPointer(*cpu_addr)};
if (cache) {
if (auto entry = TryGet(*cpu_addr); entry) {
if (entry->size >= size && entry->alignment == alignment) {
return entry->offset;
auto entry = TryGet(host_ptr);
if (entry) {
if (entry->GetSize() >= size && entry->GetAlignment() == alignment) {
return entry->GetOffset();
}
Unregister(entry);
}
@@ -55,17 +62,17 @@ u64 VKBufferCache::UploadMemory(Tegra::GPUVAddr gpu_addr, std::size_t size, u64
AlignBuffer(alignment);
const u64 uploaded_offset = buffer_offset;
Memory::ReadBlock(*cpu_addr, buffer_ptr, size);
if (!host_ptr) {
return uploaded_offset;
}
std::memcpy(buffer_ptr, host_ptr, size);
buffer_ptr += size;
buffer_offset += size;
if (cache) {
auto entry = std::make_shared<CachedBufferEntry>();
entry->offset = uploaded_offset;
entry->size = size;
entry->alignment = alignment;
entry->addr = *cpu_addr;
auto entry = std::make_shared<CachedBufferEntry>(*cpu_addr, size, uploaded_offset,
alignment, host_ptr);
Register(entry);
}

View File

@@ -24,22 +24,39 @@ class VKFence;
class VKMemoryManager;
class VKStreamBuffer;
struct CachedBufferEntry final : public RasterizerCacheObject {
VAddr GetAddr() const override {
return addr;
class CachedBufferEntry final : public RasterizerCacheObject {
public:
explicit CachedBufferEntry(VAddr cpu_addr, std::size_t size, u64 offset, std::size_t alignment,
u8* host_ptr);
VAddr GetCpuAddr() const override {
return cpu_addr;
}
std::size_t GetSizeInBytes() const override {
return size;
}
std::size_t GetSize() const {
return size;
}
u64 GetOffset() const {
return offset;
}
std::size_t GetAlignment() const {
return alignment;
}
// We do not have to flush this cache as things in it are never modified by us.
void Flush() override {}
VAddr addr;
std::size_t size;
u64 offset;
std::size_t alignment;
private:
VAddr cpu_addr{};
std::size_t size{};
u64 offset{};
std::size_t alignment{};
};
class VKBufferCache final : public RasterizerCache<std::shared_ptr<CachedBufferEntry>> {

View File

@@ -122,8 +122,7 @@ bool VKDevice::IsFormatSupported(vk::Format wanted_format, vk::FormatFeatureFlag
FormatType format_type) const {
const auto it = format_properties.find(wanted_format);
if (it == format_properties.end()) {
LOG_CRITICAL(Render_Vulkan, "Unimplemented format query={}",
static_cast<u32>(wanted_format));
LOG_CRITICAL(Render_Vulkan, "Unimplemented format query={}", vk::to_string(wanted_format));
UNREACHABLE();
return true;
}
@@ -219,11 +218,19 @@ std::map<vk::Format, vk::FormatProperties> VKDevice::GetFormatProperties(
format_properties.emplace(format, physical.getFormatProperties(format, dldi));
};
AddFormatQuery(vk::Format::eA8B8G8R8UnormPack32);
AddFormatQuery(vk::Format::eR5G6B5UnormPack16);
AddFormatQuery(vk::Format::eB5G6R5UnormPack16);
AddFormatQuery(vk::Format::eA2B10G10R10UnormPack32);
AddFormatQuery(vk::Format::eR8G8B8A8Srgb);
AddFormatQuery(vk::Format::eR8Unorm);
AddFormatQuery(vk::Format::eD32Sfloat);
AddFormatQuery(vk::Format::eD16Unorm);
AddFormatQuery(vk::Format::eD16UnormS8Uint);
AddFormatQuery(vk::Format::eD24UnormS8Uint);
AddFormatQuery(vk::Format::eD32SfloatS8Uint);
AddFormatQuery(vk::Format::eBc1RgbaUnormBlock);
AddFormatQuery(vk::Format::eBc2UnormBlock);
AddFormatQuery(vk::Format::eBc3UnormBlock);
AddFormatQuery(vk::Format::eBc4UnormBlock);
return format_properties;
}

View File

@@ -0,0 +1,81 @@
// Copyright 2019 yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#include <cstring>
#include <optional>
#include <unordered_map>
#include "common/assert.h"
#include "common/cityhash.h"
#include "video_core/renderer_vulkan/declarations.h"
#include "video_core/renderer_vulkan/maxwell_to_vk.h"
#include "video_core/renderer_vulkan/vk_sampler_cache.h"
#include "video_core/textures/texture.h"
namespace Vulkan {
static std::optional<vk::BorderColor> TryConvertBorderColor(std::array<float, 4> color) {
// TODO(Rodrigo): Manage integer border colors
if (color == std::array<float, 4>{0, 0, 0, 0}) {
return vk::BorderColor::eFloatTransparentBlack;
} else if (color == std::array<float, 4>{0, 0, 0, 1}) {
return vk::BorderColor::eFloatOpaqueBlack;
} else if (color == std::array<float, 4>{1, 1, 1, 1}) {
return vk::BorderColor::eFloatOpaqueWhite;
} else {
return {};
}
}
std::size_t SamplerCacheKey::Hash() const {
static_assert(sizeof(raw) % sizeof(u64) == 0);
return static_cast<std::size_t>(
Common::CityHash64(reinterpret_cast<const char*>(raw.data()), sizeof(raw) / sizeof(u64)));
}
bool SamplerCacheKey::operator==(const SamplerCacheKey& rhs) const {
return raw == rhs.raw;
}
VKSamplerCache::VKSamplerCache(const VKDevice& device) : device{device} {}
VKSamplerCache::~VKSamplerCache() = default;
vk::Sampler VKSamplerCache::GetSampler(const Tegra::Texture::TSCEntry& tsc) {
const auto [entry, is_cache_miss] = cache.try_emplace(SamplerCacheKey{tsc});
auto& sampler = entry->second;
if (is_cache_miss) {
sampler = CreateSampler(tsc);
}
return *sampler;
}
UniqueSampler VKSamplerCache::CreateSampler(const Tegra::Texture::TSCEntry& tsc) {
const float max_anisotropy = tsc.GetMaxAnisotropy();
const bool has_anisotropy = max_anisotropy > 1.0f;
const auto border_color = tsc.GetBorderColor();
const auto vk_border_color = TryConvertBorderColor(border_color);
UNIMPLEMENTED_IF_MSG(!vk_border_color, "Unimplemented border color {} {} {} {}",
border_color[0], border_color[1], border_color[2], border_color[3]);
constexpr bool unnormalized_coords = false;
const vk::SamplerCreateInfo sampler_ci(
{}, MaxwellToVK::Sampler::Filter(tsc.mag_filter),
MaxwellToVK::Sampler::Filter(tsc.min_filter),
MaxwellToVK::Sampler::MipmapMode(tsc.mipmap_filter),
MaxwellToVK::Sampler::WrapMode(tsc.wrap_u), MaxwellToVK::Sampler::WrapMode(tsc.wrap_v),
MaxwellToVK::Sampler::WrapMode(tsc.wrap_p), tsc.GetLodBias(), has_anisotropy,
max_anisotropy, tsc.depth_compare_enabled,
MaxwellToVK::Sampler::DepthCompareFunction(tsc.depth_compare_func), tsc.GetMinLod(),
tsc.GetMaxLod(), vk_border_color.value_or(vk::BorderColor::eFloatTransparentBlack),
unnormalized_coords);
const auto& dld = device.GetDispatchLoader();
const auto dev = device.GetLogical();
return dev.createSamplerUnique(sampler_ci, nullptr, dld);
}
} // namespace Vulkan

View File

@@ -0,0 +1,56 @@
// Copyright 2019 yuzu Emulator Project
// Licensed under GPLv2 or any later version
// Refer to the license.txt file included.
#pragma once
#include <unordered_map>
#include "common/common_types.h"
#include "video_core/renderer_vulkan/declarations.h"
#include "video_core/textures/texture.h"
namespace Vulkan {
class VKDevice;
struct SamplerCacheKey final : public Tegra::Texture::TSCEntry {
std::size_t Hash() const;
bool operator==(const SamplerCacheKey& rhs) const;
bool operator!=(const SamplerCacheKey& rhs) const {
return !operator==(rhs);
}
};
} // namespace Vulkan
namespace std {
template <>
struct hash<Vulkan::SamplerCacheKey> {
std::size_t operator()(const Vulkan::SamplerCacheKey& k) const noexcept {
return k.Hash();
}
};
} // namespace std
namespace Vulkan {
class VKSamplerCache {
public:
explicit VKSamplerCache(const VKDevice& device);
~VKSamplerCache();
vk::Sampler GetSampler(const Tegra::Texture::TSCEntry& tsc);
private:
UniqueSampler CreateSampler(const Tegra::Texture::TSCEntry& tsc);
const VKDevice& device;
std::unordered_map<SamplerCacheKey, UniqueSampler> cache;
};
} // namespace Vulkan

View File

@@ -165,6 +165,7 @@ u32 ShaderIR::DecodeInstr(NodeBlock& bb, u32 pc) {
{OpCode::Type::Hfma2, &ShaderIR::DecodeHfma2},
{OpCode::Type::Conversion, &ShaderIR::DecodeConversion},
{OpCode::Type::Memory, &ShaderIR::DecodeMemory},
{OpCode::Type::Texture, &ShaderIR::DecodeTexture},
{OpCode::Type::FloatSetPredicate, &ShaderIR::DecodeFloatSetPredicate},
{OpCode::Type::IntegerSetPredicate, &ShaderIR::DecodeIntegerSetPredicate},
{OpCode::Type::HalfSetPredicate, &ShaderIR::DecodeHalfSetPredicate},

View File

@@ -17,24 +17,6 @@ using Tegra::Shader::Attribute;
using Tegra::Shader::Instruction;
using Tegra::Shader::OpCode;
using Tegra::Shader::Register;
using Tegra::Shader::TextureMiscMode;
using Tegra::Shader::TextureProcessMode;
using Tegra::Shader::TextureType;
static std::size_t GetCoordCount(TextureType texture_type) {
switch (texture_type) {
case TextureType::Texture1D:
return 1;
case TextureType::Texture2D:
return 2;
case TextureType::Texture3D:
case TextureType::TextureCube:
return 3;
default:
UNIMPLEMENTED_MSG("Unhandled texture type: {}", static_cast<u32>(texture_type));
return 0;
}
}
u32 ShaderIR::DecodeMemory(NodeBlock& bb, u32 pc) {
const Instruction instr = {program_code[pc]};
@@ -247,194 +229,6 @@ u32 ShaderIR::DecodeMemory(NodeBlock& bb, u32 pc) {
}
break;
}
case OpCode::Id::TEX: {
UNIMPLEMENTED_IF_MSG(instr.tex.UsesMiscMode(TextureMiscMode::AOFFI),
"AOFFI is not implemented");
if (instr.tex.UsesMiscMode(TextureMiscMode::NODEP)) {
LOG_WARNING(HW_GPU, "TEX.NODEP implementation is incomplete");
}
const TextureType texture_type{instr.tex.texture_type};
const bool is_array = instr.tex.array != 0;
const bool depth_compare = instr.tex.UsesMiscMode(TextureMiscMode::DC);
const auto process_mode = instr.tex.GetTextureProcessMode();
WriteTexInstructionFloat(
bb, instr, GetTexCode(instr, texture_type, process_mode, depth_compare, is_array));
break;
}
case OpCode::Id::TEXS: {
const TextureType texture_type{instr.texs.GetTextureType()};
const bool is_array{instr.texs.IsArrayTexture()};
const bool depth_compare = instr.texs.UsesMiscMode(TextureMiscMode::DC);
const auto process_mode = instr.texs.GetTextureProcessMode();
if (instr.texs.UsesMiscMode(TextureMiscMode::NODEP)) {
LOG_WARNING(HW_GPU, "TEXS.NODEP implementation is incomplete");
}
const Node4 components =
GetTexsCode(instr, texture_type, process_mode, depth_compare, is_array);
if (instr.texs.fp32_flag) {
WriteTexsInstructionFloat(bb, instr, components);
} else {
WriteTexsInstructionHalfFloat(bb, instr, components);
}
break;
}
case OpCode::Id::TLD4: {
ASSERT(instr.tld4.array == 0);
UNIMPLEMENTED_IF_MSG(instr.tld4.UsesMiscMode(TextureMiscMode::AOFFI),
"AOFFI is not implemented");
UNIMPLEMENTED_IF_MSG(instr.tld4.UsesMiscMode(TextureMiscMode::NDV),
"NDV is not implemented");
UNIMPLEMENTED_IF_MSG(instr.tld4.UsesMiscMode(TextureMiscMode::PTP),
"PTP is not implemented");
if (instr.tld4.UsesMiscMode(TextureMiscMode::NODEP)) {
LOG_WARNING(HW_GPU, "TLD4.NODEP implementation is incomplete");
}
const auto texture_type = instr.tld4.texture_type.Value();
const bool depth_compare = instr.tld4.UsesMiscMode(TextureMiscMode::DC);
const bool is_array = instr.tld4.array != 0;
WriteTexInstructionFloat(bb, instr,
GetTld4Code(instr, texture_type, depth_compare, is_array));
break;
}
case OpCode::Id::TLD4S: {
UNIMPLEMENTED_IF_MSG(instr.tld4s.UsesMiscMode(TextureMiscMode::AOFFI),
"AOFFI is not implemented");
if (instr.tld4s.UsesMiscMode(TextureMiscMode::NODEP)) {
LOG_WARNING(HW_GPU, "TLD4S.NODEP implementation is incomplete");
}
const bool depth_compare = instr.tld4s.UsesMiscMode(TextureMiscMode::DC);
const Node op_a = GetRegister(instr.gpr8);
const Node op_b = GetRegister(instr.gpr20);
// TODO(Subv): Figure out how the sampler type is encoded in the TLD4S instruction.
std::vector<Node> coords;
if (depth_compare) {
// Note: TLD4S coordinate encoding works just like TEXS's
const Node op_y = GetRegister(instr.gpr8.Value() + 1);
coords.push_back(op_a);
coords.push_back(op_y);
coords.push_back(op_b);
} else {
coords.push_back(op_a);
coords.push_back(op_b);
}
std::vector<Node> extras;
extras.push_back(Immediate(static_cast<u32>(instr.tld4s.component)));
const auto& sampler =
GetSampler(instr.sampler, TextureType::Texture2D, false, depth_compare);
Node4 values;
for (u32 element = 0; element < values.size(); ++element) {
auto coords_copy = coords;
MetaTexture meta{sampler, {}, {}, extras, element};
values[element] = Operation(OperationCode::TextureGather, meta, std::move(coords_copy));
}
WriteTexsInstructionFloat(bb, instr, values);
break;
}
case OpCode::Id::TXQ: {
if (instr.txq.UsesMiscMode(TextureMiscMode::NODEP)) {
LOG_WARNING(HW_GPU, "TXQ.NODEP implementation is incomplete");
}
// TODO: The new commits on the texture refactor, change the way samplers work.
// Sadly, not all texture instructions specify the type of texture their sampler
// uses. This must be fixed at a later instance.
const auto& sampler =
GetSampler(instr.sampler, Tegra::Shader::TextureType::Texture2D, false, false);
u32 indexer = 0;
switch (instr.txq.query_type) {
case Tegra::Shader::TextureQueryType::Dimension: {
for (u32 element = 0; element < 4; ++element) {
if (!instr.txq.IsComponentEnabled(element)) {
continue;
}
MetaTexture meta{sampler, {}, {}, {}, element};
const Node value =
Operation(OperationCode::TextureQueryDimensions, meta, GetRegister(instr.gpr8));
SetTemporal(bb, indexer++, value);
}
for (u32 i = 0; i < indexer; ++i) {
SetRegister(bb, instr.gpr0.Value() + i, GetTemporal(i));
}
break;
}
default:
UNIMPLEMENTED_MSG("Unhandled texture query type: {}",
static_cast<u32>(instr.txq.query_type.Value()));
}
break;
}
case OpCode::Id::TMML: {
UNIMPLEMENTED_IF_MSG(instr.tmml.UsesMiscMode(Tegra::Shader::TextureMiscMode::NDV),
"NDV is not implemented");
if (instr.tmml.UsesMiscMode(TextureMiscMode::NODEP)) {
LOG_WARNING(HW_GPU, "TMML.NODEP implementation is incomplete");
}
auto texture_type = instr.tmml.texture_type.Value();
const bool is_array = instr.tmml.array != 0;
const auto& sampler = GetSampler(instr.sampler, texture_type, is_array, false);
std::vector<Node> coords;
// TODO: Add coordinates for different samplers once other texture types are implemented.
switch (texture_type) {
case TextureType::Texture1D:
coords.push_back(GetRegister(instr.gpr8));
break;
case TextureType::Texture2D:
coords.push_back(GetRegister(instr.gpr8.Value() + 0));
coords.push_back(GetRegister(instr.gpr8.Value() + 1));
break;
default:
UNIMPLEMENTED_MSG("Unhandled texture type {}", static_cast<u32>(texture_type));
// Fallback to interpreting as a 2D texture for now
coords.push_back(GetRegister(instr.gpr8.Value() + 0));
coords.push_back(GetRegister(instr.gpr8.Value() + 1));
texture_type = TextureType::Texture2D;
}
for (u32 element = 0; element < 2; ++element) {
auto params = coords;
MetaTexture meta{sampler, {}, {}, {}, element};
const Node value = Operation(OperationCode::TextureQueryLod, meta, std::move(params));
SetTemporal(bb, element, value);
}
for (u32 element = 0; element < 2; ++element) {
SetRegister(bb, instr.gpr0.Value() + element, GetTemporal(element));
}
break;
}
case OpCode::Id::TLDS: {
const Tegra::Shader::TextureType texture_type{instr.tlds.GetTextureType()};
const bool is_array{instr.tlds.IsArrayTexture()};
UNIMPLEMENTED_IF_MSG(instr.tlds.UsesMiscMode(TextureMiscMode::AOFFI),
"AOFFI is not implemented");
UNIMPLEMENTED_IF_MSG(instr.tlds.UsesMiscMode(TextureMiscMode::MZ), "MZ is not implemented");
if (instr.tlds.UsesMiscMode(TextureMiscMode::NODEP)) {
LOG_WARNING(HW_GPU, "TLDS.NODEP implementation is incomplete");
}
WriteTexsInstructionFloat(bb, instr, GetTldsCode(instr, texture_type, is_array));
break;
}
default:
UNIMPLEMENTED_MSG("Unhandled memory instruction: {}", opcode->get().GetName());
}
@@ -442,291 +236,4 @@ u32 ShaderIR::DecodeMemory(NodeBlock& bb, u32 pc) {
return pc;
}
const Sampler& ShaderIR::GetSampler(const Tegra::Shader::Sampler& sampler, TextureType type,
bool is_array, bool is_shadow) {
const auto offset = static_cast<std::size_t>(sampler.index.Value());
// If this sampler has already been used, return the existing mapping.
const auto itr =
std::find_if(used_samplers.begin(), used_samplers.end(),
[&](const Sampler& entry) { return entry.GetOffset() == offset; });
if (itr != used_samplers.end()) {
ASSERT(itr->GetType() == type && itr->IsArray() == is_array &&
itr->IsShadow() == is_shadow);
return *itr;
}
// Otherwise create a new mapping for this sampler
const std::size_t next_index = used_samplers.size();
const Sampler entry{offset, next_index, type, is_array, is_shadow};
return *used_samplers.emplace(entry).first;
}
void ShaderIR::WriteTexInstructionFloat(NodeBlock& bb, Instruction instr, const Node4& components) {
u32 dest_elem = 0;
for (u32 elem = 0; elem < 4; ++elem) {
if (!instr.tex.IsComponentEnabled(elem)) {
// Skip disabled components
continue;
}
SetTemporal(bb, dest_elem++, components[elem]);
}
// After writing values in temporals, move them to the real registers
for (u32 i = 0; i < dest_elem; ++i) {
SetRegister(bb, instr.gpr0.Value() + i, GetTemporal(i));
}
}
void ShaderIR::WriteTexsInstructionFloat(NodeBlock& bb, Instruction instr,
const Node4& components) {
// TEXS has two destination registers and a swizzle. The first two elements in the swizzle
// go into gpr0+0 and gpr0+1, and the rest goes into gpr28+0 and gpr28+1
u32 dest_elem = 0;
for (u32 component = 0; component < 4; ++component) {
if (!instr.texs.IsComponentEnabled(component))
continue;
SetTemporal(bb, dest_elem++, components[component]);
}
for (u32 i = 0; i < dest_elem; ++i) {
if (i < 2) {
// Write the first two swizzle components to gpr0 and gpr0+1
SetRegister(bb, instr.gpr0.Value() + i % 2, GetTemporal(i));
} else {
ASSERT(instr.texs.HasTwoDestinations());
// Write the rest of the swizzle components to gpr28 and gpr28+1
SetRegister(bb, instr.gpr28.Value() + i % 2, GetTemporal(i));
}
}
}
void ShaderIR::WriteTexsInstructionHalfFloat(NodeBlock& bb, Instruction instr,
const Node4& components) {
// TEXS.F16 destionation registers are packed in two registers in pairs (just like any half
// float instruction).
Node4 values;
u32 dest_elem = 0;
for (u32 component = 0; component < 4; ++component) {
if (!instr.texs.IsComponentEnabled(component))
continue;
values[dest_elem++] = components[component];
}
if (dest_elem == 0)
return;
std::generate(values.begin() + dest_elem, values.end(), [&]() { return Immediate(0); });
const Node first_value = Operation(OperationCode::HPack2, values[0], values[1]);
if (dest_elem <= 2) {
SetRegister(bb, instr.gpr0, first_value);
return;
}
SetTemporal(bb, 0, first_value);
SetTemporal(bb, 1, Operation(OperationCode::HPack2, values[2], values[3]));
SetRegister(bb, instr.gpr0, GetTemporal(0));
SetRegister(bb, instr.gpr28, GetTemporal(1));
}
Node4 ShaderIR::GetTextureCode(Instruction instr, TextureType texture_type,
TextureProcessMode process_mode, std::vector<Node> coords,
Node array, Node depth_compare, u32 bias_offset) {
const bool is_array = array;
const bool is_shadow = depth_compare;
UNIMPLEMENTED_IF_MSG((texture_type == TextureType::Texture3D && (is_array || is_shadow)) ||
(texture_type == TextureType::TextureCube && is_array && is_shadow),
"This method is not supported.");
const auto& sampler = GetSampler(instr.sampler, texture_type, is_array, is_shadow);
const bool lod_needed = process_mode == TextureProcessMode::LZ ||
process_mode == TextureProcessMode::LL ||
process_mode == TextureProcessMode::LLA;
// LOD selection (either via bias or explicit textureLod) not supported in GL for
// sampler2DArrayShadow and samplerCubeArrayShadow.
const bool gl_lod_supported =
!((texture_type == Tegra::Shader::TextureType::Texture2D && is_array && is_shadow) ||
(texture_type == Tegra::Shader::TextureType::TextureCube && is_array && is_shadow));
const OperationCode read_method =
lod_needed && gl_lod_supported ? OperationCode::TextureLod : OperationCode::Texture;
UNIMPLEMENTED_IF(process_mode != TextureProcessMode::None && !gl_lod_supported);
std::vector<Node> extras;
if (process_mode != TextureProcessMode::None && gl_lod_supported) {
if (process_mode == TextureProcessMode::LZ) {
extras.push_back(Immediate(0.0f));
} else {
// If present, lod or bias are always stored in the register indexed by the gpr20
// field with an offset depending on the usage of the other registers
extras.push_back(GetRegister(instr.gpr20.Value() + bias_offset));
}
}
Node4 values;
for (u32 element = 0; element < values.size(); ++element) {
auto copy_coords = coords;
MetaTexture meta{sampler, array, depth_compare, extras, element};
values[element] = Operation(read_method, meta, std::move(copy_coords));
}
return values;
}
Node4 ShaderIR::GetTexCode(Instruction instr, TextureType texture_type,
TextureProcessMode process_mode, bool depth_compare, bool is_array) {
const bool lod_bias_enabled =
(process_mode != TextureProcessMode::None && process_mode != TextureProcessMode::LZ);
const auto [coord_count, total_coord_count] = ValidateAndGetCoordinateElement(
texture_type, depth_compare, is_array, lod_bias_enabled, 4, 5);
// If enabled arrays index is always stored in the gpr8 field
const u64 array_register = instr.gpr8.Value();
// First coordinate index is the gpr8 or gpr8 + 1 when arrays are used
const u64 coord_register = array_register + (is_array ? 1 : 0);
std::vector<Node> coords;
for (std::size_t i = 0; i < coord_count; ++i) {
coords.push_back(GetRegister(coord_register + i));
}
// 1D.DC in OpenGL the 2nd component is ignored.
if (depth_compare && !is_array && texture_type == TextureType::Texture1D) {
coords.push_back(Immediate(0.0f));
}
const Node array = is_array ? GetRegister(array_register) : nullptr;
Node dc{};
if (depth_compare) {
// Depth is always stored in the register signaled by gpr20 or in the next register if lod
// or bias are used
const u64 depth_register = instr.gpr20.Value() + (lod_bias_enabled ? 1 : 0);
dc = GetRegister(depth_register);
}
return GetTextureCode(instr, texture_type, process_mode, coords, array, dc, 0);
}
Node4 ShaderIR::GetTexsCode(Instruction instr, TextureType texture_type,
TextureProcessMode process_mode, bool depth_compare, bool is_array) {
const bool lod_bias_enabled =
(process_mode != TextureProcessMode::None && process_mode != TextureProcessMode::LZ);
const auto [coord_count, total_coord_count] = ValidateAndGetCoordinateElement(
texture_type, depth_compare, is_array, lod_bias_enabled, 4, 4);
// If enabled arrays index is always stored in the gpr8 field
const u64 array_register = instr.gpr8.Value();
// First coordinate index is stored in gpr8 field or (gpr8 + 1) when arrays are used
const u64 coord_register = array_register + (is_array ? 1 : 0);
const u64 last_coord_register =
(is_array || !(lod_bias_enabled || depth_compare) || (coord_count > 2))
? static_cast<u64>(instr.gpr20.Value())
: coord_register + 1;
const u32 bias_offset = coord_count > 2 ? 1 : 0;
std::vector<Node> coords;
for (std::size_t i = 0; i < coord_count; ++i) {
const bool last = (i == (coord_count - 1)) && (coord_count > 1);
coords.push_back(GetRegister(last ? last_coord_register : coord_register + i));
}
const Node array = is_array ? GetRegister(array_register) : nullptr;
Node dc{};
if (depth_compare) {
// Depth is always stored in the register signaled by gpr20 or in the next register if lod
// or bias are used
const u64 depth_register = instr.gpr20.Value() + (lod_bias_enabled ? 1 : 0);
dc = GetRegister(depth_register);
}
return GetTextureCode(instr, texture_type, process_mode, coords, array, dc, bias_offset);
}
Node4 ShaderIR::GetTld4Code(Instruction instr, TextureType texture_type, bool depth_compare,
bool is_array) {
const std::size_t coord_count = GetCoordCount(texture_type);
const std::size_t total_coord_count = coord_count + (is_array ? 1 : 0);
const std::size_t total_reg_count = total_coord_count + (depth_compare ? 1 : 0);
// If enabled arrays index is always stored in the gpr8 field
const u64 array_register = instr.gpr8.Value();
// First coordinate index is the gpr8 or gpr8 + 1 when arrays are used
const u64 coord_register = array_register + (is_array ? 1 : 0);
std::vector<Node> coords;
for (size_t i = 0; i < coord_count; ++i)
coords.push_back(GetRegister(coord_register + i));
const auto& sampler = GetSampler(instr.sampler, texture_type, is_array, depth_compare);
Node4 values;
for (u32 element = 0; element < values.size(); ++element) {
auto coords_copy = coords;
MetaTexture meta{sampler, GetRegister(array_register), {}, {}, element};
values[element] = Operation(OperationCode::TextureGather, meta, std::move(coords_copy));
}
return values;
}
Node4 ShaderIR::GetTldsCode(Instruction instr, TextureType texture_type, bool is_array) {
const std::size_t type_coord_count = GetCoordCount(texture_type);
const bool lod_enabled = instr.tlds.GetTextureProcessMode() == TextureProcessMode::LL;
// If enabled arrays index is always stored in the gpr8 field
const u64 array_register = instr.gpr8.Value();
// if is array gpr20 is used
const u64 coord_register = is_array ? instr.gpr20.Value() : instr.gpr8.Value();
const u64 last_coord_register =
((type_coord_count > 2) || (type_coord_count == 2 && !lod_enabled)) && !is_array
? static_cast<u64>(instr.gpr20.Value())
: coord_register + 1;
std::vector<Node> coords;
for (std::size_t i = 0; i < type_coord_count; ++i) {
const bool last = (i == (type_coord_count - 1)) && (type_coord_count > 1);
coords.push_back(GetRegister(last ? last_coord_register : coord_register + i));
}
const Node array = is_array ? GetRegister(array_register) : nullptr;
// When lod is used always is in gpr20
const Node lod = lod_enabled ? GetRegister(instr.gpr20) : Immediate(0);
const auto& sampler = GetSampler(instr.sampler, texture_type, is_array, false);
Node4 values;
for (u32 element = 0; element < values.size(); ++element) {
auto coords_copy = coords;
MetaTexture meta{sampler, array, {}, {lod}, element};
values[element] = Operation(OperationCode::TexelFetch, meta, std::move(coords_copy));
}
return values;
}
std::tuple<std::size_t, std::size_t> ShaderIR::ValidateAndGetCoordinateElement(
TextureType texture_type, bool depth_compare, bool is_array, bool lod_bias_enabled,
std::size_t max_coords, std::size_t max_inputs) {
const std::size_t coord_count = GetCoordCount(texture_type);
std::size_t total_coord_count = coord_count + (is_array ? 1 : 0) + (depth_compare ? 1 : 0);
const std::size_t total_reg_count = total_coord_count + (lod_bias_enabled ? 1 : 0);
if (total_coord_count > max_coords || total_reg_count > max_inputs) {
UNIMPLEMENTED_MSG("Unsupported Texture operation");
total_coord_count = std::min(total_coord_count, max_coords);
}
// 1D.DC OpenGL is using a vec3 but 2nd component is ignored later.
total_coord_count +=
(depth_compare && !is_array && texture_type == TextureType::Texture1D) ? 1 : 0;
return {coord_count, total_coord_count};
}
} // namespace VideoCommon::Shader

Some files were not shown because too many files have changed in this diff Show More