로지스틱 회귀를 사용한 클래스 확률 모델링

IE & SWCON/Machine Learning

로지스틱 회귀를 사용한 클래스 확률 모델링

뱃놀이가자 2023. 11. 27. 00:49

728x90

선형 이진 분류 문제에 강력한 알고리즘인 로지스틱 회귀.
이름과 달리 분류 모델임을 주의해야 한다.

이진 분류가 아닐 경우 소프트맥스 회귀(다항 로지스틱 회귀)를 통해 문제를 해결할 수 있다.

logistic function은 odds비(특정 이벤트가 발생할 확률)의 로그형태에 대한 역함수로서 구할 수 있다.
그리고 이 함수를 로지스틱 시그모이드함수, 간단하게 시그모이드 함수라고 한다.

t 대신에 x.transpose * theta를 넣어서 이진분류에 보다 직관적으로 사용하기도 한다.
아무튼, t >0 에서 1로 분류 t<0에서 0으로 분류를 하고 t=0에서는 애매하고 어느 특정 점에서의 확률은 존재하지 않기 때문에 따로 정의하지는 않는다.

조금 악필이긴 하지만 다음과 같은 과정을 통해
Log-likelihood의 Derivative를 구할 수 있다.

그리고 그 derivatives 값을 최소화하도록 한다.

from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression(C=1e5, fit_intercept=True)

# Create an instance of Logistic Regression Classifier and fit the data.
X = iris_X.to_numpy()[:,:2]
Y = iris_y2
logreg.fit(X, Y)

xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, .02))
Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)

# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')

plt.show()

로지스틱회귀를 scikit-learn을 통해 적용하면 위와 같다.

C : float, default=1.0
    Inverse of regularization strength; must be a positive float.
    Like in support vector machines, smaller values specify stronger
    regularization.

파라미터 C의 값은 값이 작을 수록 더 강한 규제를 걸게 된다. 즉 모델의 일반화에 중점을 준다. 정확도는 감소할 수 있음
반대로 C의 값은 값이 클 수록 더 약한 규제를 걸 것이고 모델의 일반화보다는 test set에 초점을 둔 복잡한 모델을 만든다.

로지스틱 회귀에 규제를 적용할 수도 있다.

penalty : {'l1', 'l2', 'elasticnet', None}, default='l2'
    Specify the norm of the penalty:

    - `None`: no penalty is added;
    - `'l2'`: add a L2 penalty term and it is the default choice;
    - `'l1'`: add a L1 penalty term;
    - `'elasticnet'`: both L1 and L2 penalty terms are added.

    .. warning::
       Some penalties may not work with some solvers. See the parameter
       `solver` below, to know the compatibility between the penalty and
       solver.

    .. versionadded:: 0.19
       l1 penalty with SAGA solver (allowing 'multinomial' + L1)

    .. deprecated:: 1.2
       The 'none' option was deprecated in version 1.2, and will be removed
       in 1.4. Use `None` instead.

소프트맥스 함수는 다음과 같은 과정을 거친다.

# https://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression(C=1e5, multi_class='multinomial')

# Create an instance of Softmax and fit the data.
logreg.fit(X, iris_y)
Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)

# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=iris_y, edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')

plt.show()

multinomial에 집중하자.

각 샘플이 어떤 클래스에 속할 확률을 predict_proba 메서드를 사용해 구할 수 있다.
아래는 소프트맥스 함수를 사용한 경우의 확률 값과 클래스 분류값이다. (확률이다 보니 행을 기준으로 합하면 1이 된다)

logreg.predict_proba(X[:5,:])
logreg.predict_proba(X[:5,:]).argmax(axis=1)
# = logreg.predict(X[:5,:])

array([[0.92347315, 0.0585081 , 0.01801875],
       [0.791565  , 0.18091265, 0.02752235],
       [0.94236404, 0.05086345, 0.00677251],
       [0.94055354, 0.05375943, 0.00568703],
       [0.96185313, 0.02961439, 0.00853248]])

array([0, 0, 0, 0, 0], dtype=int64)

728x90

저작자표시 비영리

'IE & SWCON > Machine Learning' 카테고리의 다른 글

Kernel Trick (1)	2023.11.27
SVM(Support Vector Machine) (1)	2023.11.27
Regression Line fitting - 선형 회귀/Normal Equation/Non-linear(polynomial)/릿지,라쏘 규제 (0)	2023.11.26
[혼공머딥] chapter 6 - 교재 外 심화과정 (0)	2023.08.30
[혼공머딥] chapter 6 (0)	2023.08.29

현재글로지스틱 회귀를 사용한 클래스 확률 모델링

티스토리에 담긴 나의 히스토리 컨설턴트를 향한 발자국

티스토리에 담긴 나의 히스토리

컨설턴트를 향한 발자국

경희대학교, technology forecasting, 텍스트마이닝, STM, c++, 기술경영, 데이터분석, 특허분석, LDA, 한빛미디어, C++기초, USPTO, 파이썬, 논문리뷰, 데이터사이언스, 머신러닝, 산업공학, Python, 자료구조, 토픽모델링,

Today :
Yesterday :

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

티스토리에 담긴 나의 히스토리