1

I am using MSS to capture the screenshot of my screen. (Because it captures faster screenshots)

But I am not sure how to go about capturing a specific window in Mac, I know they have win32 for Windows users... They code I have now is just a constant loop capturing my main monitor.

main.py :

import cv2 as cv
import numpy as np
from time import time
from mss import mss


def window_capture():
    loop_time = time()

    with mss() as sct:
        monitor = {"top": 40, "left": 0, "width": 800, "height": 600}

        while(True):

            screenshot = np.array(sct.grab(monitor))
            screenshot = cv.cvtColor(screenshot, cv.COLOR_RGB2BGR)

            cv.imshow('Computer Vision', screenshot)

            print('FPS {}'.format(1 / (time() - loop_time)))
            loop_time = time()

            if cv.waitKey(1) == ord('q'):
                cv.destroyAllWindows()
                break


window_capture()

print('Done.')
Blue
  • 141
  • 1
  • 3
  • 20

2 Answers2

4

I wrote the following piece of ObjectiveC that gets the names, owners, window id and position on screen of all the windows in macOS. I saved it as windowlist.m and compiled it with the commands in the comments at the top of the file:

////////////////////////////////////////////////////////////////////////////////
// windowlist.m
// Mark Setchell
//
// Get list of windows with their characteristics
//
// Compile with:
// clang windowlist.m -o windowlist -framework coregraphics -framework cocoa
//
// Run with:
// ./windowlist
//
////////////////////////////////////////////////////////////////////////////////
#include <Cocoa/Cocoa.h>
#include <CoreGraphics/CGWindow.h>

int main(int argc, char **argv)
{
   NSArray *windows = (NSArray *)CGWindowListCopyWindowInfo(kCGWindowListOptionOnScreenOnly,kCGNullWindowID);
   for(NSDictionary *window in windows){
      int WindowNum = [[window objectForKey:(NSString *)kCGWindowNumber] intValue];
      NSString* OwnerName = [window objectForKey:(NSString *)kCGWindowOwnerName];
      int OwnerPID = [[window objectForKey:(NSString *) kCGWindowOwnerPID] intValue];
      NSString* WindowName= [window objectForKey:(NSString *)kCGWindowName];
      CFDictionaryRef bounds = (CFDictionaryRef)[window objectForKey:(NSString *)kCGWindowBounds];
      CGRect rect;
      CGRectMakeWithDictionaryRepresentation(bounds,&rect);
      printf("%s:%s:%d:%d:%f,%f,%f,%f\n",[OwnerName UTF8String],[WindowName UTF8String],WindowNum,OwnerPID,rect.origin.x,rect.origin.y,rect.size.height,rect.size.width);
   }
}

It gives output like this, where the last 4 items on each line are the window top-left corner, height and width. You can either run this program "as is" with Python's subprocess.Popen() and get the window list, or you could maybe convert it to Python using PyObjc Python module:

Location Menu:Item-0:4881:1886:1043.000000,0.000000,22.000000,28.000000
Backup and sync from Google:Item-0:1214:8771:1071.000000,0.000000,22.000000,30.000000
Dropbox:Item-0:451:1924:1101.000000,0.000000,22.000000,28.000000
NordVPN IKE:Item-0:447:1966:1129.000000,0.000000,22.000000,26.000000
PromiseUtilityDaemon:Item-0:395:1918:1155.000000,0.000000,22.000000,24.000000
SystemUIServer:AppleTimeMachineExtra:415:1836:1179.000000,0.000000,22.000000,40.000000
SystemUIServer:AppleBluetoothExtra:423:1836:1219.000000,0.000000,22.000000,30.000000
SystemUIServer:AirPortExtra:409:1836:1249.000000,0.000000,22.000000,30.000000
SystemUIServer:AppleVolumeExtra:427:1836:1279.000000,0.000000,22.000000,30.000000
SystemUIServer:BatteryExtra:405:1836:1309.000000,0.000000,22.000000,67.000000
SystemUIServer:AppleClockExtra:401:1836:1376.000000,0.000000,22.000000,123.000000
SystemUIServer:AppleUser:419:1836:1499.000000,0.000000,22.000000,99.000000
Spotlight:Item-0:432:1922:1598.000000,0.000000,22.000000,36.000000
SystemUIServer:NotificationCenter:391:1836:1634.000000,0.000000,22.000000,46.000000
Window Server:Menubar:353:253:0.000000,0.000000,22.000000,1680.000000
Dock:Dock:387:1835:0.000000,0.000000,1050.000000,1680.000000
Terminal:windowlist — -bash — 140×30:4105:6214:70.000000,285.000000,658.000000,1565.000000
Mark Setchell
  • 169,892
  • 24
  • 238
  • 370
1

If you want to open Chrome browser, then you can use Python built-in package webbrowser. You will need to supply a path to Chrome app e.g: webbrowser.get('open -a /Applications/Google\ Chrome.app %s').open('http://docs.python.org/')

Once browser is open, the app position will be where it left off. MSS doesnot allow you to select the app. Instead you can grab the entire screen or a set position (as you've specified monitor = {"top": 40, "left": 0, "width": 800, "height": 600}). Therefore you might want to force the browser to go full screen. This can be achieve using pyautogui package to enter in the hotkeys.

import webbrowser
import pyautogui

def openApp(url, appPath):
    webbrowser.get(appPath).open(url)

def fullScreen():    
    pyautogui.hotkey('command', 'ctrl', 'f') # hotKeys for full screen mode in MacOS

url = 'http://docs.python.org/'    
appPath = 'open -a /Applications/Google\ Chrome.app %s' #MacOS
#appPath = 'C:/Program Files (x86)/Google/Chrome/Application/chrome.exe %s' # Windows
#appPath = ' /usr/bin/google-chrome %s' #Linux
openApp(url, appPath)
fullScreen()
# here you can add logic to take screenshots

(Note: I've only tested this on Windows, but should work on MacOS)

Greg
  • 4,257
  • 2
  • 11
  • 20
  • I like this, but I am trying to open a app, if you are curious as of what app, it would be runescape. – Blue Jul 09 '20 at 17:49
  • I will mark this as the answer tho because it gave me a better general idea of where to go, I think I can use something like import OS, and then os.startfile ? – Blue Jul 09 '20 at 17:50
  • 1
    OS.startfile maybe windows only. This should work os.system('C:\dev\HelloWorld.exe'). See https://stackoverflow.com/questions/13222808/how-to-run-external-executable-using-python/13222809 – Greg Jul 09 '20 at 18:37